Change search
ReferencesLink to record
Permanent link

Direct link
Optimizing Sparse Matrix Assembly in Finite Element Solvers with One-Sided Communication
KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).ORCID iD: 0000-0002-5020-1631
2013 (English)In: High Performance Computing for Computational Science - VECPAR 2012, Springer Berlin/Heidelberg, 2013, 128-139 p.Conference paper (Refereed)
Abstract [en]

In parallel finite element solvers, sparse matrix assembly is often a bottleneck. Implemented using message passing, latency from message matching starts to limit performance as the number of cores increases. We here address this issue by using our own stack based representation of the sparse matrix, and a hybrid parallel programming model combining traditional message passing with one-sided communication. This gives an significantly faster insertion rate compared to state of the art implementations on a Cray XE6.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2013. 128-139 p.
, Lecture Notes in Computer Science, ISSN 0302-9743 ; 7851
Keyword [en]
UPC, PGAS, Hybrid Parallel Programming
National Category
Computational Mathematics
URN: urn:nbn:se:kth:diva-125739DOI: 10.1007/978-3-642-38718-0_15ISI: 000342997100015ScopusID: 2-s2.0-84883275982ISBN: 978-3-642-38717-3OAI: diva2:640313
10th International Conference on High Performance Computing for Computational Science, VECPAR 2012; Kobe; Japan; 17 July 2012 through 20 July 2012

QC 20130815

Available from: 2013-08-13 Created: 2013-08-13 Last updated: 2014-11-13Bibliographically approved
In thesis
1. High Performance Adaptive Finite Element Methods: With Applications in Aerodynamics
Open this publication in new window or tab >>High Performance Adaptive Finite Element Methods: With Applications in Aerodynamics
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The massive computational cost for resolving all scales in a turbulent flow makes a direct numerical simulation of the underlying Navier-Stokes equations impossible in most engineering applications. Recent advances in adaptive finite element methods offer a new powerful tool in Computational Fluid Dynamics (CFD). The computational cost for simulating turbulent flow can be minimized by adaptively resolution of the mesh, based on a posteriori error estimation. Such adaptive methods have previously been implemented for efficient serial computations, but the extension to an efficient parallel solver is a challenging task. This work concerns the development of an adaptive finite element method that enables efficient computation of time resolved approximations of turbulent flow for complex geometries with a posteriori error control. We present efficient data structures and data decomposition methods for distributed unstructured tetrahedral meshes. Our work also concerns an efficient parallelization of local mesh refinement methods such as recursive longest edge bisection, and the development of an a priori predictive dynamic load balancing method, based on a weighted dual graph. We also address the challenges of emerging supercomputer architectures with the development of new hybrid parallel programming models, combining traditional message passing with lightweight one-sided communication. Our implementation has proven to be both general and efficient, scaling up to more than twelve thousands cores.

Abstract [sv]

Den höga beräkningskostnaden för att lösa upp alla turbulenta skalor för ett realistiskt problem gör en direkt numerisk simulering av Navier-Stokes ekvationer omöjlig. De senaste framstegen inom adaptiva finita element metoder ger ett nytt kraftfullt verktyg inom Computational Fluid Dynamics (CFD). Beräkningskostnaden för en simulering av turbulent flöde kan minimeras genom att beräkningsnätet adaptivt förfinas baserat på en a posteriori feluppskattning. Dessa adaptiva metoder har tidigare implementerats för seriella beräkningar, medan en effektiv parallellisering av metoden inte är trivial. I denna avhandling presenterar vi vår utveckling av en adaptiv finita element lösare, anpassad för att effektivt beräkna tidsupplösta approximationer i komplicerade geometrier med a posteriori felkontroll. Effektiva datastrukturer och metoder för ostrukturerade beräkningsnät av tetrahedrar presenteras. Avhandlingen behandlar även effektiv parallellisering av lokala nätförfiningsmetoder, exempelvis recursive longest edge bisection. Även lastbalanseringsproblematiken behandlas, där problemet lösts genom utvecklandet av en prediktiv dynamisk lastbalanseringsmetod, baserad på en viktad dualgraf av beräkningsnätet. Slutligen avhandlas även problematiken med att effektivt utnyttja nytillkomna superdatorarkitekturer, genom utvecklandet av en hybrid parallelliserings modell som kombinerar traditionell meddelande baserad parallellisering med envägskommunikation. Detta har resulterat i en generell samt effektiv implementation med god skalning upp till fler än tolv tusen processorkärnor.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2013. xii, 50 p.
TRITA-CSC-A, ISSN 1653-5723 ; 2013:07
National Category
Computational Mathematics
urn:nbn:se:kth:diva-125742 (URN)978-91-7501-814-0 (ISBN)
Public defence
2013-09-11, F3, Lindstedtsvägen 26, KTH, Stockholm, 10:15 (English)

QC 20130816

Available from: 2013-08-16 Created: 2013-08-13 Last updated: 2016-02-02Bibliographically approved

Open Access in DiVA

fulltext(215 kB)1186 downloads
File information
File name FULLTEXT01.pdfFile size 215 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopusSpringer

Search in DiVA

By author/editor
Jansson, Niclas
By organisation
High Performance Computing and Visualization (HPCViz)
Computational Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 1186 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 157 hits
ReferencesLink to record
Permanent link

Direct link