Endre søk
Begrens søket
1 - 10 of 10
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Akhmetova, Dana
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Cebamanos, L.
    Iakymchuk, Roman
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Rotaru, T.
    Rahn, M.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Bartsch, V.
    Simmendinger, C.
    Interoperability of GASPI and MPI in large scale scientific applications2018Inngår i: 12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017, Springer Verlag , 2018, s. 277-287Konferansepaper (Fagfellevurdert)
    Abstract [en]

    One of the main hurdles of a broad distribution of PGAS approaches is the prevalence of MPI, which as a de-facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like GASPI without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we address this challenge by providing our study and preliminary performance results regarding interoperating GASPI and MPI on the performance crucial parts of the Ludwig and iPIC3D applications. In addition, we draw a strategy for better coupling of both APIs. 

  • 2.
    Al Ahad, Muhammed Abdullah
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Simmendinger, Christian
    T Syst Solut Res GmbH, D-70563 Stuttgart, Germany..
    Iakymchuk, Roman
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Efficient Algorithms for Collective Operations with Notified Communication in Shared Windows2018Inngår i: PROCEEDINGS OF PAW-ATM18: 2018 IEEE/ACM PARALLEL APPLICATIONS WORKSHOP, ALTERNATIVES TO MPI (PAW-ATM), IEEE , 2018, s. 1-10Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Collective operations are commonly used in various parts of scientific applications. Especially in strong scaling scenarios collective operations can negatively impact the overall applications performance: while the load per rank here decreases with increasing core counts, time spent in e.g. barrier operations will increase logarithmically with the core count. In this article, we develop novel algorithmic solutions for collective operations such as Allreduce and Allgather(V)-by leveraging notified communication in shared windows. To this end, we have developed an extension of GASPI which enables all ranks participating in a shared window to observe the entire notified communication targeted at the window. By exploring benefits of this extension, we deliver high performing implementations of Allreduce and Allgather(V) on Intel and Cray clusters. These implementations clearly achieve 2x-4x performance improvements compared to the best performing MPI implementations for various data distributions.

  • 3. Iakymchuk, Roman
    Reproducibility-Assuring Strategies for Parallel Preconditioned Conjugate GradientManuskript (preprint) (Annet vitenskapelig)
    Abstract [en]

    The Preconditioned Conjugate Gradient method is often used in numerical simulations ranging from gyro-fluid to space plasma physics. While being widely used, the solver is also known for its lack of accuracy while computing the residual. In this article, we aim at a twofold goal: enhance the accuracy of the solver but also ensure its reproducibility in a message-passing implementation. We design and employ various strategies starting from the ExBLAS approach (through preserving every bit of information until final rounding) to a more lightweight performance-oriented strategy (through expanding the intermediate precision). These algorithmic strategies are reinforced with programmability suggestions to assure deterministic executions. Finally, we verify these strategies on modern HPC systems with up-to 768 processes.

  • 4.
    Iakymchuk, Roman
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Jordan, Herbert
    University of Innsbruck, Institute of Computer Science.
    Bo Peng, Ivy
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    A Particle-in-Cell Method for Automatic Load-Balancing with the AllScale Environment2016Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    We present an initial design and implementation of a Particle-in-Cell (PIC) method based on the work carried out in the European Exascale AllScale project. AllScale provides a unified programming system for the effective development of highly scalable, resilient and performance-portable parallel applications for Exascale systems. The AllScale approach is based on task-based nested recursive parallelism and it provides mechanisms for automatic load-balancing in the PIC simulations. We provide the preliminary results of the AllScale-based PIC implementation and draw directions for its future development. 

  • 5.
    Iakymchuk, Roman
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Shakhno, S. M.
    Yarmola, H. P.
    CONVERGENCE ANALYSIS OF A TWO-STEP MODIFICATION OF THE GAUSS-NEWTON METHOD AND ITS APPLICATIONS2017Inngår i: JOURNAL OF NUMERICAL AND APPLIED MATHEMATICS, ISSN 0868-6912, Vol. 3, nr 126, s. 61-74Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We investigate the convergence of a two-step modification of the Gauss-Newton method applying the generalized Lipschitz condition for the first- and second-order derivatives. The convergence order as well as the convergence radius of the method are studied and the uniqueness ball of the solution of the nonlinear least squares problem is examined. Finally, we carry out numerical experiments on a set of well-known test problems.

  • 6.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Peng, Ivybo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Iakymchuk, Roman
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Kestor, G.
    Gioiosa, R.
    A performance characterization of streaming computing on supercomputers2016Inngår i: Procedia Computer Science, Elsevier, 2016, s. 98-107Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Streaming computing models allow for on-the-y processing of large data sets. With the increased demand for processing large amount of data in a reasonable period of time, streaming models are more and more used on supercomputers to solve data-intensive problems. Because supercomputers have been mainly used for compute-intensive workload, supercomputer performance metrics focus on the number of oating point operations in time and cannot fully characterize a streaming application performance on supercomputers. We introduce the injection and processing rates as the main metrics to characterize the performance of streaming computing on supercomputers. We analyze the dynamics of these quantities in a modi ed STREAM benchmark developed atop of an MPI streaming library in a series of di erent congurations. We show that after a brief transient the injection and processing rates converge to sustained rates. We also demonstrate that streaming computing performance strongly depends on the number of connections between data producers and consumers and on the processing task granularity.

  • 7.
    Simmendinger, Christian
    et al.
    T Syst Solut Res, Stuttgart, Germany..
    Iakymchuk, Roman
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Cebamanos, Luis
    Univ Edinburgh, EPCC, Edinburgh, Midlothian, Scotland..
    Akhmetova, Dana
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Bartsch, Valeria
    Fraunhofer ITWM, HPC Dept, Kaiserslautern, Germany..
    Rotaru, Tiberiu
    Fraunhofer ITWM, Kaiserslautern, Germany..
    Rahn, Mirko
    Fraunhofer ITWM, HPC Dept, Kaiserslautern, Germany..
    Laure, Erwin
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. KTH Royal Inst Technol, High Performance Comp, Stockholm, Sweden.;KTH Royal Inst Technol, PDC Ctr, High Performance Comp Ctr, Stockholm, Sweden..
    Markidis, Stefano
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST). KTH Royal Inst Technol, High Performance Comp, Stockholm, Sweden..
    Interoperability strategies for GASPI and MPI in large-scale scientific applications2019Inngår i: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 33, nr 3, s. 554-568Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    One of the main hurdles of partitioned global address space (PGAS) approaches is the dominance of message passing interface (MPI), which as a de facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like global address space programming interface (GASPI) without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we consider an interoperable GASPI/MPI implementation for the communication/performance crucial parts of the Ludwig and iPIC3D applications. To address the discovered performance limitations, we develop a novel strategy for significantly improved performance and interoperability between both APIs by leveraging GASPI shared windows and shared notifications. First results with a corresponding implementation in the MiniGhost proxy application and the Allreduce collective operation demonstrate the viability of this approach.

  • 8.
    Thoman, Peter
    et al.
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Dichev, Kiril
    Queens Univ Belfast, Belfast BT7 1NN, Antrim, North Ireland..
    Heller, Thomas
    Univ Erlangen Nurnberg, D-91058 Erlangen, Germany..
    Iakymchuk, Roman
    KTH.
    Aguilar, Xavier
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Hasanov, Khalid
    IBM Ireland, Dublin 15, Ireland..
    Gschwandtner, Philipp
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Lemarinier, Pierre
    IBM Ireland, Dublin 15, Ireland..
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Jordan, Herbert
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Fahringer, Thomas
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Katrinis, Kostas
    IBM Ireland, Dublin 15, Ireland..
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Nikolopoulos, Dimitrios S.
    Queens Univ Belfast, Belfast BT7 1NN, Antrim, North Ireland..
    A taxonomy of task-based parallel programming technologies for high-performance computing2018Inngår i: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 74, nr 4, s. 1422-1434Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Task-based programming models for shared memory-such as Cilk Plus and OpenMP 3-are well established and documented. However, with the increase in parallel, many-core, and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing (HPC), no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today.

  • 9.
    Thoman, Peter
    et al.
    Univ Innsbruck, Innsbruck, Austria..
    Hasanov, Khalid
    IBM Ireland, Dublin, Ireland..
    Dichev, Kiril
    Queens Univ Belfast, Belfast, Antrim, North Ireland..
    Iakymchuk, Roman
    KTH.
    Aguilar, Xavier
    KTH.
    Gschwandtner, Philipp
    Univ Innsbruck, Innsbruck, Austria..
    Lemarinier, Pierre
    IBM Ireland, Dublin, Ireland..
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Jordan, Herbert
    Univ Innsbruck, Innsbruck, Austria..
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Katrinis, Kostas
    IBM Ireland, Dublin, Ireland..
    Nikolopoulos, Dimitrios S.
    Queens Univ Belfast, Belfast, Antrim, North Ireland..
    Fahringer, Thomas
    Univ Innsbruck, Innsbruck, Austria..
    A Taxonomy of Task-Based Technologies for High-Performance Computing2018Inngår i: PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2017), PT II / [ed] Wyrzykowski, R Dongarra, J Deelman, E Karczewski, K, SPRINGER INTERNATIONAL PUBLISHING AG , 2018, s. 264-274Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Task-based programming models for shared memory - such as Cilk Plus and OpenMP 3 - are well established and documented. However, with the increase in heterogeneous, many-core and parallel systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing, no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today.

  • 10. Wiesenberger, M.
    et al.
    Einkemmer, L.
    Held, M.
    Gutierrez-Milla, A.
    Sáez, X.
    Iakymchuk, Roman
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Reproducibility, accuracy and performance of the FELTOR code and library on parallel computer architectures2019Inngår i: Computer Physics Communications, ISSN 0010-4655, E-ISSN 1879-2944, Vol. 238, s. 145-156Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    FELTOR is a modular and free scientific software package. It allows developing platform independent code that runs on a variety of parallel computer architectures ranging from laptop CPUs to multi-GPU distributed memory systems. FELTOR consists of both a numerical library and a collection of application codes built on top of the library. Its main targets are two- and three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin methods as the main numerical discretization technique. We observe that numerical simulations of a recently developed gyro-fluid model produce non-deterministic results in parallel computations. First, we show how we restore accuracy and bitwise reproducibility algorithmically and programmatically. In particular, we adopt an implementation of the exactly rounded dot product based on long accumulators, which avoids accuracy losses especially in parallel applications. However, reproducibility and accuracy alone fail to indicate correct simulation behavior. In fact, in the physical model slightly different initial conditions lead to vastly different end states. This behavior translates to its numerical representation. Pointwise convergence, even in principle, becomes impossible for long simulation times. We briefly discuss alternative methods to ensure the correctness of results like the convergence of reduced physical quantities of interest, ensemble simulations, invariants or reduced simulation times. In a second part, we explore important performance tuning considerations. We identify latency and memory bandwidth as the main performance indicators of our routines. Based on these, we propose a parallel performance model that predicts the execution time of algorithms implemented in FELTOR and test our model on a selection of parallel hardware architectures. We are able to predict the execution time with a relative error of less than 25% for problem sizes between 10 −1 and 10 3 MB. Finally, we find that the product of latency and bandwidth gives a minimum array size per compute node to achieve a scaling efficiency above 50% (both strong and weak).

1 - 10 of 10
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf