Change search
Refine search result
1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Akhmetova, Dana
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Cebamanos, L.
    Iakymchuk, Roman
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Rotaru, T.
    Rahn, M.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Bartsch, V.
    Simmendinger, C.
    Interoperability of GASPI and MPI in large scale scientific applications2018In: 12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017, Springer Verlag , 2018, p. 277-287Conference paper (Refereed)
    Abstract [en]

    One of the main hurdles of a broad distribution of PGAS approaches is the prevalence of MPI, which as a de-facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like GASPI without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we address this challenge by providing our study and preliminary performance results regarding interoperating GASPI and MPI on the performance crucial parts of the Ludwig and iPIC3D applications. In addition, we draw a strategy for better coupling of both APIs. 

  • 2.
    Al Ahad, Muhammed Abdullah
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Simmendinger, Christian
    T Syst Solut Res GmbH, D-70563 Stuttgart, Germany..
    Iakymchuk, Roman
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Efficient Algorithms for Collective Operations with Notified Communication in Shared Windows2018In: PROCEEDINGS OF PAW-ATM18: 2018 IEEE/ACM PARALLEL APPLICATIONS WORKSHOP, ALTERNATIVES TO MPI (PAW-ATM), IEEE , 2018, p. 1-10Conference paper (Refereed)
    Abstract [en]

    Collective operations are commonly used in various parts of scientific applications. Especially in strong scaling scenarios collective operations can negatively impact the overall applications performance: while the load per rank here decreases with increasing core counts, time spent in e.g. barrier operations will increase logarithmically with the core count. In this article, we develop novel algorithmic solutions for collective operations such as Allreduce and Allgather(V)-by leveraging notified communication in shared windows. To this end, we have developed an extension of GASPI which enables all ranks participating in a shared window to observe the entire notified communication targeted at the window. By exploring benefits of this extension, we deliver high performing implementations of Allreduce and Allgather(V) on Intel and Cray clusters. These implementations clearly achieve 2x-4x performance improvements compared to the best performing MPI implementations for various data distributions.

  • 3. Iakymchuk, Roman
    Reproducibility-Assuring Strategies for Parallel Preconditioned Conjugate GradientManuscript (preprint) (Other academic)
    Abstract [en]

    The Preconditioned Conjugate Gradient method is often used in numerical simulations ranging from gyro-fluid to space plasma physics. While being widely used, the solver is also known for its lack of accuracy while computing the residual. In this article, we aim at a twofold goal: enhance the accuracy of the solver but also ensure its reproducibility in a message-passing implementation. We design and employ various strategies starting from the ExBLAS approach (through preserving every bit of information until final rounding) to a more lightweight performance-oriented strategy (through expanding the intermediate precision). These algorithmic strategies are reinforced with programmability suggestions to assure deterministic executions. Finally, we verify these strategies on modern HPC systems with up-to 768 processes.

  • 4.
    Iakymchuk, Roman
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Jordan, Herbert
    University of Innsbruck, Institute of Computer Science.
    Bo Peng, Ivy
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    A Particle-in-Cell Method for Automatic Load-Balancing with the AllScale Environment2016Conference paper (Other academic)
    Abstract [en]

    We present an initial design and implementation of a Particle-in-Cell (PIC) method based on the work carried out in the European Exascale AllScale project. AllScale provides a unified programming system for the effective development of highly scalable, resilient and performance-portable parallel applications for Exascale systems. The AllScale approach is based on task-based nested recursive parallelism and it provides mechanisms for automatic load-balancing in the PIC simulations. We provide the preliminary results of the AllScale-based PIC implementation and draw directions for its future development. 

  • 5.
    Iakymchuk, Roman
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Shakhno, S. M.
    Yarmola, H. P.
    CONVERGENCE ANALYSIS OF A TWO-STEP MODIFICATION OF THE GAUSS-NEWTON METHOD AND ITS APPLICATIONS2017In: JOURNAL OF NUMERICAL AND APPLIED MATHEMATICS, ISSN 0868-6912, Vol. 3, no 126, p. 61-74Article in journal (Refereed)
    Abstract [en]

    We investigate the convergence of a two-step modification of the Gauss-Newton method applying the generalized Lipschitz condition for the first- and second-order derivatives. The convergence order as well as the convergence radius of the method are studied and the uniqueness ball of the solution of the nonlinear least squares problem is examined. Finally, we carry out numerical experiments on a set of well-known test problems.

  • 6.
    Markidis, Stefano
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Peng, Ivybo
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Iakymchuk, Roman
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Kestor, G.
    Gioiosa, R.
    A performance characterization of streaming computing on supercomputers2016In: Procedia Computer Science, Elsevier, 2016, p. 98-107Conference paper (Refereed)
    Abstract [en]

    Streaming computing models allow for on-the-y processing of large data sets. With the increased demand for processing large amount of data in a reasonable period of time, streaming models are more and more used on supercomputers to solve data-intensive problems. Because supercomputers have been mainly used for compute-intensive workload, supercomputer performance metrics focus on the number of oating point operations in time and cannot fully characterize a streaming application performance on supercomputers. We introduce the injection and processing rates as the main metrics to characterize the performance of streaming computing on supercomputers. We analyze the dynamics of these quantities in a modi ed STREAM benchmark developed atop of an MPI streaming library in a series of di erent congurations. We show that after a brief transient the injection and processing rates converge to sustained rates. We also demonstrate that streaming computing performance strongly depends on the number of connections between data producers and consumers and on the processing task granularity.

  • 7.
    Simmendinger, Christian
    et al.
    T Syst Solut Res, Stuttgart, Germany..
    Iakymchuk, Roman
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Cebamanos, Luis
    Univ Edinburgh, EPCC, Edinburgh, Midlothian, Scotland..
    Akhmetova, Dana
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Bartsch, Valeria
    Fraunhofer ITWM, HPC Dept, Kaiserslautern, Germany..
    Rotaru, Tiberiu
    Fraunhofer ITWM, Kaiserslautern, Germany..
    Rahn, Mirko
    Fraunhofer ITWM, HPC Dept, Kaiserslautern, Germany..
    Laure, Erwin
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC. KTH Royal Inst Technol, High Performance Comp, Stockholm, Sweden.;KTH Royal Inst Technol, PDC Ctr, High Performance Comp Ctr, Stockholm, Sweden..
    Markidis, Stefano
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST). KTH Royal Inst Technol, High Performance Comp, Stockholm, Sweden..
    Interoperability strategies for GASPI and MPI in large-scale scientific applications2019In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 33, no 3, p. 554-568Article in journal (Refereed)
    Abstract [en]

    One of the main hurdles of partitioned global address space (PGAS) approaches is the dominance of message passing interface (MPI), which as a de facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like global address space programming interface (GASPI) without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we consider an interoperable GASPI/MPI implementation for the communication/performance crucial parts of the Ludwig and iPIC3D applications. To address the discovered performance limitations, we develop a novel strategy for significantly improved performance and interoperability between both APIs by leveraging GASPI shared windows and shared notifications. First results with a corresponding implementation in the MiniGhost proxy application and the Allreduce collective operation demonstrate the viability of this approach.

  • 8.
    Thoman, Peter
    et al.
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Dichev, Kiril
    Queens Univ Belfast, Belfast BT7 1NN, Antrim, North Ireland..
    Heller, Thomas
    Univ Erlangen Nurnberg, D-91058 Erlangen, Germany..
    Iakymchuk, Roman
    KTH.
    Aguilar, Xavier
    KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Hasanov, Khalid
    IBM Ireland, Dublin 15, Ireland..
    Gschwandtner, Philipp
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Lemarinier, Pierre
    IBM Ireland, Dublin 15, Ireland..
    Markidis, Stefano
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH Royal Inst Technol, S-10044 Stockholm, Sweden..
    Jordan, Herbert
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Fahringer, Thomas
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Katrinis, Kostas
    IBM Ireland, Dublin 15, Ireland..
    Laure, Erwin
    KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Nikolopoulos, Dimitrios S.
    Queens Univ Belfast, Belfast BT7 1NN, Antrim, North Ireland..
    A taxonomy of task-based parallel programming technologies for high-performance computing2018In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 74, no 4, p. 1422-1434Article in journal (Refereed)
    Abstract [en]

    Task-based programming models for shared memory-such as Cilk Plus and OpenMP 3-are well established and documented. However, with the increase in parallel, many-core, and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing (HPC), no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today.

  • 9.
    Thoman, Peter
    et al.
    Univ Innsbruck, Innsbruck, Austria..
    Hasanov, Khalid
    IBM Ireland, Dublin, Ireland..
    Dichev, Kiril
    Queens Univ Belfast, Belfast, Antrim, North Ireland..
    Iakymchuk, Roman
    KTH.
    Aguilar, Xavier
    KTH.
    Gschwandtner, Philipp
    Univ Innsbruck, Innsbruck, Austria..
    Lemarinier, Pierre
    IBM Ireland, Dublin, Ireland..
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Jordan, Herbert
    Univ Innsbruck, Innsbruck, Austria..
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Katrinis, Kostas
    IBM Ireland, Dublin, Ireland..
    Nikolopoulos, Dimitrios S.
    Queens Univ Belfast, Belfast, Antrim, North Ireland..
    Fahringer, Thomas
    Univ Innsbruck, Innsbruck, Austria..
    A Taxonomy of Task-Based Technologies for High-Performance Computing2018In: PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2017), PT II / [ed] Wyrzykowski, R Dongarra, J Deelman, E Karczewski, K, SPRINGER INTERNATIONAL PUBLISHING AG , 2018, p. 264-274Conference paper (Refereed)
    Abstract [en]

    Task-based programming models for shared memory - such as Cilk Plus and OpenMP 3 - are well established and documented. However, with the increase in heterogeneous, many-core and parallel systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing, no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today.

  • 10. Wiesenberger, M.
    et al.
    Einkemmer, L.
    Held, M.
    Gutierrez-Milla, A.
    Sáez, X.
    Iakymchuk, Roman
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Reproducibility, accuracy and performance of the FELTOR code and library on parallel computer architectures2019In: Computer Physics Communications, ISSN 0010-4655, E-ISSN 1879-2944, Vol. 238, p. 145-156Article in journal (Refereed)
    Abstract [en]

    FELTOR is a modular and free scientific software package. It allows developing platform independent code that runs on a variety of parallel computer architectures ranging from laptop CPUs to multi-GPU distributed memory systems. FELTOR consists of both a numerical library and a collection of application codes built on top of the library. Its main targets are two- and three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin methods as the main numerical discretization technique. We observe that numerical simulations of a recently developed gyro-fluid model produce non-deterministic results in parallel computations. First, we show how we restore accuracy and bitwise reproducibility algorithmically and programmatically. In particular, we adopt an implementation of the exactly rounded dot product based on long accumulators, which avoids accuracy losses especially in parallel applications. However, reproducibility and accuracy alone fail to indicate correct simulation behavior. In fact, in the physical model slightly different initial conditions lead to vastly different end states. This behavior translates to its numerical representation. Pointwise convergence, even in principle, becomes impossible for long simulation times. We briefly discuss alternative methods to ensure the correctness of results like the convergence of reduced physical quantities of interest, ensemble simulations, invariants or reduced simulation times. In a second part, we explore important performance tuning considerations. We identify latency and memory bandwidth as the main performance indicators of our routines. Based on these, we propose a parallel performance model that predicts the execution time of algorithms implemented in FELTOR and test our model on a selection of parallel hardware architectures. We are able to predict the execution time with a relative error of less than 25% for problem sizes between 10 −1 and 10 3 MB. Finally, we find that the product of latency and bandwidth gives a minimum array size per compute node to achieve a scaling efficiency above 50% (both strong and weak).

1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf