Ändra sökning
Avgränsa sökresultatet
12 51 - 99 av 99
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 51. Lapenta, Giovanni
    et al.
    Pierrard, Viviane
    Keppens, Rony
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Poedts, Stefaan
    Sebek, Ondrej
    Travnicek, Pavel M.
    Henri, Pierre
    Califano, Francesco
    Pegoraro, Francesco
    Faganello, Matteo
    Olshevsky, Vyacheslav
    Restante, Anna Lisa
    Nordlund, Åke
    Frederiksen, Jacob Trier
    Mackay, Duncan H.
    Parnell, Clare E.
    Bemporad, Alessandro
    Susino, Roberto
    Borremans, Kris
    SWIFF: Space weather integrated forecasting framework2013Ingår i: Journal of Space Weather and Space Climate, ISSN 2115-7251, E-ISSN 2115-7251, Vol. 3, s. A05-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    SWIFF is a project funded by the Seventh Framework Programme of the European Commission to study the mathematical-physics models that form the basis for space weather forecasting. The phenomena of space weather span a tremendous scale of densities and temperature with scales ranging 10 orders of magnitude in space and time. Additionally even in local regions there are concurrent processes developing at the electron, ion and global scales strongly interacting with each other. The fundamental challenge in modelling space weather is the need to address multiple physics and multiple scales. Here we present our approach to take existing expertise in fluid and kinetic models to produce an integrated mathematical approach and software infrastructure that allows fluid and kinetic processes to be modelled together. SWIFF aims also at using this new infrastructure to model specific coupled processes at the Solar Corona, in the interplanetary space and in the interaction at the Earth magnetosphere.

  • 52.
    Ma, Yingjuan
    et al.
    Univ Calif Los Angeles, Dept Earth Planetary & Space Sci, Los Angeles, CA 90095 USA..
    Russell, Christopher T.
    Univ Calif Los Angeles, Dept Earth Planetary & Space Sci, Los Angeles, CA 90095 USA..
    Toth, Gabor
    Univ Michigan, Dept Climate & Space Sci & Engn, Ann Arbor, MI 48109 USA..
    Chen, Yuxi
    Univ Michigan, Dept Climate & Space Sci & Engn, Ann Arbor, MI 48109 USA..
    Nagy, Andrew F.
    Univ Michigan, Dept Climate & Space Sci & Engn, Ann Arbor, MI 48109 USA..
    Harada, Yuki
    Univ Iowa, Dept Phys & Astron, Iowa City, IA 52242 USA..
    McFadden, James
    Univ Calif Berkeley, Space Sci Lab, Berkeley, CA 94720 USA..
    Halekas, Jasper S.
    Univ Iowa, Dept Phys & Astron, Iowa City, IA 52242 USA..
    Lillis, Rob
    Univ Calif Berkeley, Space Sci Lab, Berkeley, CA 94720 USA..
    Connerney, John E. P.
    NASA, Goddard Space Flight Ctr, Greenbelt, MD USA..
    Espley, Jared
    NASA, Goddard Space Flight Ctr, Greenbelt, MD USA..
    DiBraccio, Gina A.
    NASA, Goddard Space Flight Ctr, Greenbelt, MD USA..
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST). KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Peng, Ivy Bo
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Fang, Xiaohua
    Univ Colorado, Lab Atmospher & Space Phys, Boulder, CO 80309 USA..
    Jakosky, Bruce M.
    Univ Colorado, Lab Atmospher & Space Phys, Boulder, CO 80309 USA..
    Reconnection in the Martian Magnetotail: Hall-MHD With Embedded Particle-in-Cell Simulations2018Ingår i: Journal of Geophysical Research - Space Physics, ISSN 2169-9380, E-ISSN 2169-9402, Vol. 123, nr 5, s. 3742-3763Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Mars Atmosphere and Volatile EvolutioN (MAVEN) mission observations show clear evidence of the occurrence of the magnetic reconnection process in the Martian plasma tail. In this study, we use sophisticated numerical models to help us understand the effects of magnetic reconnection in the plasma tail. The numerical models used in this study are (a) a multispecies global Hall-magnetohydrodynamic (HMHD) model and (b) a global HMHD model two-way coupled to an embedded fully kinetic particle-in-cell code. Comparison with MAVEN observations clearly shows that the general interaction pattern is well reproduced by the global HMHD model. The coupled model takes advantage of both the efficiency of the MHD model and the ability to incorporate kinetic processes of the particle-in-cell model, making it feasible to conduct kinetic simulations for Mars under realistic solar wind conditions for the first time. Results from the coupled model show that the Martian magnetotail is highly dynamic due to magnetic reconnection, and the resulting Mars-ward plasma flow velocities are significantly higher for the lighter ion fluid, which are quantitatively consistent with MAVEN observations. The HMHD with Embedded Particle-in-Cell model predicts that the ion loss rates are more variable but with similar mean values as compared with HMHD model results.

  • 53. Manzini, G.
    et al.
    Delzanno, G. L.
    Vencels, J.
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    A Legendre-Fourier spectral method with exact conservation laws for the Vlasov-Poisson system2016Ingår i: Journal of Computational Physics, ISSN 0021-9991, E-ISSN 1090-2716, Vol. 317, s. 82-107Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present the design and implementation of an L-2-stable spectral method for the discretization of the Vlasov-Poisson model of a collisionless plasma in one space and velocity dimension. The velocity and space dependence of the Vlasov equation are resolved through a truncated spectral expansion based on Legendre and Fourier basis functions, respectively. The Poisson equation, which is coupled to the Vlasov equation, is also resolved through a Fourier expansion. The resulting system of ordinary differential equation is discretized by the implicit second-order accurate Crank-Nicolson time discretization. The non-linear dependence between the Vlasov and Poisson equations is iteratively solved at any time cycle by a Jacobian-Free Newton-Krylov method. In this work we analyze the structure of the main conservation laws of the resulting Legendre-Fourier model, e.g., mass, momentum, and energy, and prove that they are exactly satisfied in the semi-discrete and discrete setting. The L-2-stability of the method is ensured by discretizing the boundary conditions of the distribution function at the boundaries of the velocity domain by a suitable penalty term. The impact of the penalty term on the conservation properties is investigated theoretically and numerically. An implementation of the penalty term that does not affect the conservation of mass, momentum and energy, is also proposed and studied. A collisional term is introduced in the discrete model to control the filamentation effect, but does not affect the conservation properties of the system. Numerical results on a set of standard test problems illustrate the performance of the method.

  • 54. Marchand, R.
    et al.
    Miyake, Y.
    Usui, H.
    Deca, J.
    Lapenta, G.
    Mateo-Velez, J. C.
    Ergun, R. E.
    Sturner, A.
    Genot, V.
    Hilgers, A.
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Cross-comparison of spacecraft-environment interaction model predictions applied to Solar Probe Plus near perihelion2014Ingår i: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 21, nr 6, s. 062901-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Five spacecraft-plasma models are used to simulate the interaction of a simplified geometry Solar Probe Plus (SPP) satellite with the space environment under representative solar wind conditions near perihelion. By considering similarities and differences between results obtained with different numerical approaches under well defined conditions, the consistency and validity of our models can be assessed. The impact on model predictions of physical effects of importance in the SPP mission is also considered by comparing results obtained with and without these effects. Simulation results are presented and compared with increasing levels of complexity in the physics of interaction between solar environment and the SPP spacecraft. The comparisons focus particularly on spacecraft floating potentials, contributions to the currents collected and emitted by the spacecraft, and on the potential and density spatial profiles near the satellite. The physical effects considered include spacecraft charging, photoelectron and secondary electron emission, and the presence of a background magnetic field. Model predictions obtained with our different computational approaches are found to be in agreement within 2% when the same physical processes are taken into account and treated similarly. The comparisons thus indicate that, with the correct description of important physical effects, our simulation models should have the required skill to predict details of satellite-plasma interaction physics under relevant conditions, with a good level of confidence. Our models concur in predicting a negative floating potential V-fl similar to -10V for SPP at perihelion. They also predict a "saturated emission regime" whereby most emitted photo-and secondary electron will be reflected by a potential barrier near the surface, back to the spacecraft where they will be recollected.

  • 55.
    Markidis, Markidis
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Lapenta, G.
    Delzanno, G. L.
    Henri, P.
    Goldman, M. V.
    Newman, D. L.
    Intrator, T.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Signatures of secondary collisionless magnetic reconnection driven by kink instability of a flux rope2014Ingår i: Plasma Physics and Controlled Fusion, ISSN 0741-3335, E-ISSN 1361-6587, Vol. 56, nr 6, s. 064010-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The kinetic features of secondary magnetic reconnection in a single flux rope undergoing internal kink instability are studied by means of three-dimensional particle-in-cell simulations. Several signatures of secondary magnetic reconnection are identified in the plane perpendicular to the flux rope: a quadrupolar electron and ion density structure and a bipolar Hall magnetic field develop in proximity of the reconnection region. The most intense electric fields form perpendicularly to the local magnetic field, and a reconnection electric field is identified in the plane perpendicular to the flux rope. An electron current develops along the reconnection line, in the opposite direction of the electron current supporting the flux rope magnetic field structure. Along the reconnection line, several bipolar structures of the electric field parallel to the magnetic field occur, making the magnetic reconnection region turbulent. The reported signatures of secondary magnetic reconnection can help to localize magnetic reconnection events in space, astrophysical and fusion plasmas.

  • 56.
    Markidis, Stefano
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Chien, Steven Wei Der
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Peng, I. B.
    Vetter, J. S.
    NVIDIA tensor core programmability, performance & precision2018Ingår i: Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018, Institute of Electrical and Electronics Engineers (IEEE), 2018, s. 522-531, artikel-id 8425458Konferensbidrag (Refereegranskat)
    Abstract [en]

    The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called Tensor Core that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta microarchitecture, provides 640 Tensor Cores with a theoretical peak performance of 125 Tflops/s in mixed precision. In this paper, we investigate current approaches to program NVIDIA Tensor Cores, their performances and the precision loss due to computation in mixed precision. Currently, NVIDIA provides three different ways of programming matrix-multiply-and-accumulate on Tensor Cores: the CUDA Warp Matrix Multiply Accumulate (WMMA) API, CUTLASS, a templated library based on WMMA, and cuBLAS GEMM. After experimenting with different approaches, we found that NVIDIA Tensor Cores can deliver up to 83 Tflops/s in mixed precision on a Tesla V100 GPU, seven and three times the performance in single and half precision respectively. A WMMA implementation of batched GEMM reaches a performance of 4 Tflops/s. While precision loss due to matrix multiplication with half precision input might be critical in many HPC applications, it can be considerably reduced at the cost of increased computation. Our results indicate that HPC applications using matrix multiplications can strongly benefit from using of NVIDIA Tensor Cores.

  • 57.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Gong, Jing
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Schliephake, Michael
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Hart, Alistair
    Henty, David
    Heisey, Katherine
    Fischer, Paul
    OpenACC acceleration of the Nek5000 spectral element code2015Ingår i: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 29, nr 3, s. 311-319Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present a case study of porting NekBone, a skeleton version of the Nek5000 code, to a parallel GPU-accelerated system. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. The original NekBone Fortran source code has been used as the base and enhanced by OpenACC directives. The profiling of NekBone provided an assessment of the suitability of the code for GPU systems, and indicated possible kernel optimizations. To port NekBone to GPU systems required little effort and a small number of additional lines of code (approximately one OpenACC directive per 1000 lines of code). The naïve implementation using OpenACC leads to little performance improvement: on a single node, from 16 Gflops obtained with the version without OpenACC, we reached 20 Gflops with the naïve OpenACC implementation. An optimized NekBone version leads to a 43 Gflop performance on a single node. In addition, we ported and optimized NekBone to parallel GPU systems, reaching a parallel efficiency of 79.9% on 1024 GPUs of the Titan XK7 supercomputer at the Oak Ridge National Laboratory.

  • 58.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Henri, P.
    Lapenta, G.
    Divin, A.
    Goldman, M.
    Newman, D.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Kinetic simulations of plasmoid chain dynamics2013Ingår i: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 20, nr 8, s. 082105-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The dynamics of a plasmoid chain is studied with three dimensional Particle-in-Cell simulations. The evolution of the system with and without a uniform guide field, whose strength is 1/3 the asymptotic magnetic field, is investigated. The plasmoid chain forms by spontaneous magnetic reconnection: the tearing instability rapidly disrupts the initial current sheet generating several small-scale plasmoids that rapidly grow in size coalescing and kinking. The plasmoid kink is mainly driven by the coalescence process. It is found that the presence of guide field strongly influences the evolution of the plasmoid chain. Without a guide field, a main reconnection site dominates and smaller reconnection regions are included in larger ones, leading to an hierarchical structure of the plasmoid-dominated current sheet. On the contrary in presence of a guide field, plasmoids have approximately the same size and the hierarchical structure does not emerge, a strong core magnetic field develops in the center of the plasmoid in the direction of the existing guide field, and bump-on-tail instability, leading to the formation of electron holes, is detected in proximity of the plasmoids.

  • 59.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Henri, P.
    Lapenta, G.
    Divin, A.
    Goldman, M. V.
    Newman, D.
    Eriksson, S.
    Collisionless magnetic reconnection in a plasmoid chain2012Ingår i: Nonlinear processes in geophysics, ISSN 1023-5809, E-ISSN 1607-7946, Vol. 19, nr 1, s. 145-153Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The kinetic features of plasmoid chain formation and evolution are investigated by two dimensional Particlein-Cell simulations. Magnetic reconnection is initiated in multiple X points by the tearing instability. Plasmoids form and grow in size by continuously coalescing. Each chain plasmoid exhibits a strong out-of plane core magnetic field and an out-of-plane electron current that drives the coalescing process. The disappearance of the X points in the coalescence process are due to anti-reconnection, a magnetic reconnection where the plasma inflow and outflow are reversed with respect to the original reconnection flow pattern. Anti-reconnection is characterized by the Hall magnetic field quadrupole signature. Two new kinetic features, not reported by previous studies of plasmoid chain evolution, are here revealed. First, intense electric fields develop in-plane normally to the separatrices and drive the ion dynamics in the plasmoids. Second, several bipolar electric field structures are localized in proximity of the plasmoid chain. The analysis of the electron distribution function and phase space reveals the presence of counter-streaming electron beams, unstable to the two stream instability, and phase space electron holes along the reconnection separatrices.

  • 60.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Henri, Pierre
    Lapenta, Giovanni
    Ronnmark, Kjell
    Hamrin, Maria
    Meliani, Zakaria
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    The Fluid-Kinetic Particle-in-Cell method for plasma simulations2014Ingår i: Journal of Computational Physics, ISSN 0021-9991, E-ISSN 1090-2716, Vol. 271, s. 415-429Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A method that solves concurrently the multi-fluid and Maxwell's equations has been developed for plasma simulations. By calculating the stress tensor in the multi-fluid momentum equation by means of computational particles moving in a self-consistent electromagnetic field, the kinetic effects are retained while solving the multi-fluid equations. The Maxwell's and multi-fluid equations are discretized implicitly in time enabling kinetic simulations over time scales typical of the fluid simulations. The Fluid-Kinetic Particle-in-Cell method has been implemented in a three-dimensional electromagnetic code, and tested against the two-stream instability, the Weibel instability, the ion cyclotron resonance and magnetic reconnection problems. The method is a promising approach for coupling fluid and kinetic methods in a unified framework.

  • 61.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Lapenta, G.
    Divin, A.
    Goldman, M.
    Newman, D.
    Andersson, L.
    Three dimensional density cavities in guide field collisionless magnetic reconnection2012Ingår i: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 19, nr 3, s. 032119-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Particle-in-cell simulations of collisionless magnetic reconnection with a guide field reveal for the first time the three dimensional features of the low density regions along the magnetic reconnection separatrices, the so-called cavities. It is found that structures with further lower density develop within the cavities. Because their appearance is similar to the rib shape, these formations are here called low density ribs. Their location remains approximately fixed in time and their density progressively decreases, as electron currents along the cavities evacuate them. They develop along the magnetic field lines and are supported by a strong perpendicular electric field that oscillates in space. In addition, bipolar parallel electric field structures form as isolated spheres between the cavities and the outflow plasma, along the direction of the low density ribs and of magnetic field lines.

  • 62.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Solving software challenges for exascale: International Conference on Exascale Applications and Software, EASC 2014 Stockholm, Sweden, April 2–3, 2014 revised selected papers2015Ingår i: International Conference on Exascale Applications and Software, EASC 2014, Elsevier, 2015, Vol. 8759Konferensbidrag (Refereegranskat)
  • 63.
    Markidis, Stefano
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Olshevsky, Vyacheslav
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Sishtla, C. P.
    Chien, Steven W. D.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Lapenta, G.
    PolyPIC: The polymorphic-particle-in-cell method for fluid-kinetic coupling2018Ingår i: Frontiers in Physics, E-ISSN 2296-424X, Vol. 6, nr OCT, artikel-id 100Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Particle-in-Cell (PIC) methods are widely used computational tools for fluid and kinetic plasma modeling. While both the fluid and kinetic PIC approaches have been successfully used to target either kinetic or fluid simulations, little was done to combine fluid and kinetic particles under the same PIC framework. This work addresses this issue by proposing a new PIC method, PolyPIC, that uses polymorphic computational particles. In this numerical scheme, particles can be either kinetic or fluid, and fluid particles can become kinetic when necessary, e.g., particles undergoing a strong acceleration. We design and implement the PolyPIC method, and test it against the Landau damping of Langmuir and ion acoustic waves, two stream instability and sheath formation. We unify the fluid and kinetic PIC methods under one common framework comprising both fluid and kinetic particles, providing a tool for adaptive fluid-kinetic coupling in plasma simulations.

  • 64.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Peng, Ivy Bo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Larsson Träff, Jesper
    Rougier, Antoine
    Bartsch, Valeria
    Machado, Rui
    Rahn, Mirko
    Hart, Alistair
    Holmes, Daniel
    Bull, Mark
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    The EPiGRAM Project: Preparing Parallel Programming Models for Exascale2016Ingår i: HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, Springer, 2016, s. 56-68Konferensbidrag (Refereegranskat)
    Abstract [en]

    EPiGRAM is a European Commission funded project to improve existing parallel programming models to run efficiently large scale applications on exascale supercomputers. The EPiGRAM project focuses on the two current dominant petascale programming models, message-passing and PGAS, and on the improvement of two of their associated programming systems, MPI and GASPI. In EPiGRAM, we work on two major aspects of programming systems. First, we improve the performance of communication operations by decreasing the memory consumption, improving collective operations and introducing emerging computing models. Second, we enhance the interoperability of message-passing and PGAS by integrating them in one PGAS-based MPI implementation, called EMPI4Re, implementing MPI endpoints and improving GASPI interoperability with MPI. The new EPiGRAM concepts are tested in two large-scale applications, iPIC3D, a Particle-in-Cell code for space physics simulations, and Nek5000, a Computational Fluid Dynamics code.

  • 65.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Peng, Ivybo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Iakymchuk, Roman
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Kestor, G.
    Gioiosa, R.
    A performance characterization of streaming computing on supercomputers2016Ingår i: Procedia Computer Science, Elsevier, 2016, s. 98-107Konferensbidrag (Refereegranskat)
    Abstract [en]

    Streaming computing models allow for on-the-y processing of large data sets. With the increased demand for processing large amount of data in a reasonable period of time, streaming models are more and more used on supercomputers to solve data-intensive problems. Because supercomputers have been mainly used for compute-intensive workload, supercomputer performance metrics focus on the number of oating point operations in time and cannot fully characterize a streaming application performance on supercomputers. We introduce the injection and processing rates as the main metrics to characterize the performance of streaming computing on supercomputers. We analyze the dynamics of these quantities in a modi ed STREAM benchmark developed atop of an MPI streaming library in a series of di erent congurations. We show that after a brief transient the injection and processing rates converge to sustained rates. We also demonstrate that streaming computing performance strongly depends on the number of connections between data producers and consumers and on the processing task granularity.

  • 66.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Schliephake, Michael
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Aguilar, Xavier
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Henty, David
    University of Edinburgh.
    Richardson, Harvey
    Cray Inc..
    Hart, Alistair
    Cray Inc..
    Gray, Alan
    University of Edinburgh.
    Lecomber, David
    Allinea Software Limited.
    Hilbrich, Tobias
    Technische Universität Dresden.
    Doleschal, Jens
    Technische Universität Dresden.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Paving the path to exascale computing with CRESTA development environment2013Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    The development and implementation of efficient computer codes for exascale supercomputers will require combined advancement of all development environment components: compilers, automatic tuning frameworks, run-time systems, debuggers and performance monitoring and analysis tools. The exascale era poses unprecedented challenges. Because the presence of accelerators is more and more common among the fastest supercomputer and will play a role in exascale computing, compilers will need to support hybrid computer architectures and generate efficient code hiding the complexity of programming accelerators. Hand optimization of the code will be very difficult on exascale machine and will be increasingly assisted by automatic tuners. Application tuning will be more focus on parallel aspects of the computation because of large amount of available parallelism. The application workload will be distributed over million of processes, and to implement ad-hoc strategies directly in the application will be probably unfeasible while an adaptive run-time system will provide automatic load balancing. Debuggers and performance monitoring tools will deal with million processes and with huge amount of data from application and hardware counters, but they will still be required to minimize the overhead and retain scalability. In this talk, we present how the development environment of the CRESTA exascale EC project meets all these challenges by advancing the state of the art in the field.

    An investigation of compiler support for hybrid GPU programming, the design concepts, and the main characteristics of the alpha prototype implementation of the CRESTA development environment components for exascale computing are presented. A performance study of OpenACC compiler directives has been carried out, showing very promising results and indicating OpenACC as viable approach for programming hybrid exascale supercomputer. A new Domain-Specific Language (DSL) has been defined for the expression of parallel auto-tuning at very large scale. The focus of on the extension of the auto-tuning approach into the parallel domain to enable tuning of communication-related aspects of application. A new adaptive run-time system has been designed to schedule processes depending on the resource availability, on the workload, and on the run-time analysis of the application performance. The Allinea DDT debugger and the Dresden University of Technology MUST MPI correctness checker are being extended to provide a unified interface, to improve scalability, and to include new disruptive technology based on statistical analysis of run-time behavior of the application for anomalies detection. The new exascale prototypes of the Dresden University of Technology Vampir, VampirTrace and Score-P performance monitoring and analysis tools have been released. The new features include the possibility of applying filtering technique before loading performance data to drastically reduce memory needs during the performance analysis. The initial evaluation study of the development environment is targeted on the CRESTA project applications to determine how the development environment could be coupled into a production suite for exascale computing.

  • 67.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Vencels, Juris
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Peng, Ivy Bo
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Akhmetova, Dana
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Henri, Pierre
    Idle waves in high-performance computing2015Ingår i: Physical Review E. Statistical, Nonlinear, and Soft Matter Physics, ISSN 1539-3755, E-ISSN 1550-2376, Vol. 91, nr 1, s. 013306-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The vast majority of parallel scientific applications distributes computation among processes that are in a busy state when computing and in an idle state when waiting for information from other processes. We identify the propagation of idle waves through processes in scientific applications with a local information exchange between the two processes. Idle waves are nondispersive and have a phase velocity inversely proportional to the average busy time. The physical mechanism enabling the propagation of idle waves is the local synchronization between two processes due to remote data dependency. This study provides a description of the large number of processes in parallel scientific applications as a continuous medium. This work also is a step towards an understanding of how localized idle periods can affect remote processes, leading to the degradation of global performance in parallel scientific applications.

  • 68. Narasimhamurthy, S.
    et al.
    Danilov, N.
    Wu, S.
    Umanesan, G.
    Chien, Steven Wei Der
    KTH.
    Rivas-Gomez, Sergio
    KTH.
    Peng, Ivy Bo
    KTH.
    Laure, Erwin
    KTH.
    De Witt, S.
    Pleiter, D.
    Markidis, Stefano
    KTH.
    The SAGE project: A storage centric approach for exascale computing2018Ingår i: 2018 ACM International Conference on Computing Frontiers, CF 2018 - Proceedings, Association for Computing Machinery (ACM), 2018, s. 287-292Konferensbidrag (Refereegranskat)
    Abstract [en]

    SAGE (Percipient StorAGe for Exascale Data Centric Computing) is a European Commission funded project towards the era of Exascale computing. Its goal is to design and implement a Big Data/Extreme Computing (BDEC) capable infrastructure with associated software stack. The SAGE system follows a storage centric approach as it is capable of storing and processing large data volumes at the Exascale regime. SAGE addresses the convergence of Big Data Analysis and HPC in an era of next-generation data centric computing. This convergence is driven by the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors where data needs to be processed, analyzed and integrated into simulations to derive scientific and innovative insights. A first prototype of the SAGE system has been been implemented and installed at the Jülich Supercomputing Center. The SAGE storage system consists of multiple types of storage device technologies in a multi-tier I/O hierarchy, including flash, disk, and non-volatile memory technologies. The main SAGE software component is the Seagate Mero Object Storage that is accessible via the Clovis API and higher level interfaces. The SAGE project also includes scientific applications for the validation of the SAGE concepts. The objective of this paper is to present the SAGE project concepts, the prototype of the SAGE platform and discuss the software architecture of the SAGE system.

  • 69.
    Narasimhamurthy, Sai
    et al.
    Seagate Syst UK, London, England..
    Danilov, Nikita
    Seagate Syst UK, London, England..
    Wu, Sining
    Seagate Syst UK, London, England..
    Umanesan, Ganesan
    Seagate Syst UK, London, England..
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Rivas-Gomez, Sergio
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Peng, Ivy Bo
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Pleiter, Dirk
    Julich Supercomp Ctr, Julich, Germany..
    de Witt, Shaun
    Culham Ctr Fus Energy, Abingdon, Oxon, England..
    SAGE: Percipient Storage for Exascale Data Centric Computing2019Ingår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 83, s. 22-33Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infrastructure. SAGE addresses the increasing overlaps between Big Data Analysis and HPC in an era of next-generation data centric computing that has developed due to the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors, whose data needs to be processed, analysed and integrated into simulations to derive scientific and innovative insights. Indeed, Exascale I/O, as a problem that has not been sufficiently dealt with for simulation codes, is appropriately addressed by the SAGE platform. The objective of this paper is to discuss the software architecture of the SAGE system and look at early results we have obtained employing some of its key methodologies, as the system continues to evolve.

  • 70. Olshevsky, V.
    et al.
    Lapenta, G.
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Energetics of Kinetic Reconnection in a Three-Dimensional Null-Point Cluster2013Ingår i: Physical Review Letters, ISSN 0031-9007, E-ISSN 1079-7114, Vol. 111, nr 4, s. 045002-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We perform three-dimensional particle-in-cell simulations of magnetic reconnection with multiple magnetic null points. Magnetic field energy conversion into kinetic energy is about five times higher than in traditional Harris sheet configuration. More than 85% of initial magnetic field energy is transferred to particle energy during 25 reversed ion cyclofrequencies. Magnetic reconnection in the cluster of null points evolves in three phases. During the first phase, ion beams are excited, then give part of their energy back to the magnetic field in the second phase. In the third phase, magnetic reconnection occurs in many small patches around the current channels formed along the stripes of a low magnetic field. Magnetic reconnection in null points essentially presents three-dimensional features, with no two-dimensional symmetries or current sheets.

  • 71. Olshevsky, Vyacheslav
    et al.
    Deca, Jan
    Divin, Andrey
    Peng, Ivy Bo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Innocenti, Maria Elena
    Cazzola, Emanuele
    Lapenta, Giovanni
    Magnetic Null Points In Kinetic Simulations of Space Plasmas2016Ingår i: Astrophysical Journal, ISSN 0004-637X, E-ISSN 1538-4357, Vol. 819, nr 1, artikel-id 52Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present a systematic attempt to study magnetic null points and the associated magnetic energy conversion in kinetic particle-in-cell simulations of various plasma configurations. We address three-dimensional simulations performed with the semi-implicit kinetic electromagnetic code iPic3D in different setups: variations of a Harris current sheet, dipolar and quadrupolar magnetospheres interacting with the solar wind,. and a relaxing turbulent configuration with multiple null points. Spiral nulls are more likely created in space plasmas: in all our simulations except lunar magnetic anomaly (LMA) and quadrupolar mini-magnetosphere the number of spiral nulls prevails over the number of radial nulls by a factor of 3-9. We show that often magnetic nulls do not indicate the regions of intensive energy dissipation. Energy dissipation events caused by topological bifurcations at radial nulls are rather rare and short-lived. The so-called X-lines formed by the radial nulls in the Harris current sheet and LMA simulations are rather stable and do not exhibit any energy dissipation. Energy dissipation is more powerful in the vicinity of spiral nulls enclosed by magnetic flux ropes with strong currents at their axes (their cross. sections resemble 2D magnetic islands). These null lines reminiscent of Z-pinches efficiently dissipate magnetic energy due to secondary instabilities such as the two-stream or kinking instability, accompanied by changes in magnetic topology. Current enhancements accompanied by spiral nulls may signal magnetic energy conversion sites in the observational data.

  • 72. Olshevsky, Vyacheslav
    et al.
    Divin, Andrey
    Eriksson, Elin
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Lapenta, Giovanni
    ENERGY DISSIPATION IN MAGNETIC NULL POINTS AT KINETIC SCALES2015Ingår i: Astrophysical Journal, ISSN 0004-637X, E-ISSN 1538-4357, Vol. 807, nr 2, artikel-id 155Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We use kinetic particle-in-cell and MHD simulations supported by an observational data set to investigate magnetic reconnection in clusters of null points in space plasma. The magnetic configuration under investigation is driven by fast adiabatic flux rope compression that dissipates almost half of the initial magnetic field energy. In this phase powerful currents are excited producing secondary instabilities, and the system is brought into a state of "intermittent turbulence" within a few ion gyro-periods. Reconnection events are distributed all over the simulation domain and energy dissipation is rather volume-filling. Numerous spiral null points interconnected via their spines form null lines embedded into magnetic flux ropes; null point pairs demonstrate the signatures of torsional spine reconnection. However, energy dissipation mainly happens in the shear layers formed by adjacent flux ropes with oppositely directed currents. In these regions radial null pairs are spontaneously emerging and vanishing, associated with electron streams and small-scale current sheets. The number of spiral nulls in the simulation outweighs the number of radial nulls by a factor of 5-10, in accordance with Cluster observations in the Earth's magnetosheath. Twisted magnetic fields with embedded spiral null points might indicate the regions of major energy dissipation for future space missions such as the Magnetospheric Multiscale Mission.

  • 73. Olshevsky, Vyacheslav
    et al.
    Lapenta, Giovanni
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Divin, Andrey
    Role of Z-pinches in magnetic reconnection in space plasmas2015Ingår i: Journal of Plasma Physics, ISSN 0022-3778, E-ISSN 1469-7807, Vol. 81, nr 1, artikel-id 325810105Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A widely accepted scenario of magnetic reconnection in collisionless space plasmas is the breakage of magnetic field lines in X-points. In laboratory, reconnection is commonly studied in pinches, current channels embedded into twisted magnetic fields. No model of magnetic reconnection in space plasmas considers both nullpoints and pinches as peers. We have performed a particle-in-cell simulation of magnetic reconnection in a three-dimensional configuration where null-points are present initially, and Z-pinches are formed during the simulation along the lines of spiral null-points. The non-spiral null-points are more stable than spiral ones, and no substantial energy dissipation is associated with them. On the contrary, turbulent magnetic reconnection in the pinches causes the magnetic energy to decay at a rate of similar to 1.5% per ion gyro period. Dissipation in similar structures is a likely scenario in space plasmas with large fraction of spiral null-points.

  • 74.
    Peng, Bo
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Vaivads, A.
    Vencels, Juris
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Amaya, J.
    Divin, A.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Lapenta, G.
    The formation of a magnetosphere with implicit Particle-in-Cell simulations2015Ingår i: Procedia Computer Science, Elsevier, 2015, nr 1, s. 1178-1187Konferensbidrag (Refereegranskat)
    Abstract [en]

    We demonstrate the improvements to an implicit Particle-in-Cell code, iPic3D, on the example of dipolar magnetic field immersed in the flow of the plasma and show the formation of a magnetosphere. We address the problem of modelling multi-scale phenomena during the formation of a magnetosphere by implementing an adaptive sub-cycling technique to resolve the motion of particles located close to the magnetic dipole centre, where the magnetic field intensity is maximum. In addition, we implemented new open boundary conditions to model the inflow and outflow of plasma. We present the results of a global three-dimensional Particle-in-Cell simulation and discuss the performance improvements from the adaptive sub-cycling technique.

  • 75.
    Peng, I. B.
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Gioiosa, R.
    Kestor, G.
    Cicotti, P.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Exploring the performance benefit of hybrid memory system on HPC environments2017Ingår i: Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017, Institute of Electrical and Electronics Engineers (IEEE), 2017, s. 683-692, artikel-id 7965110Konferensbidrag (Refereegranskat)
    Abstract [en]

    Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional DRAM memory. Theoretically, HBM can provide ∼4× higher bandwidth than conventional DRAM. However, many factors impact the effective performance achieved by applications, including the application memory access pattern, the problem size, the threading level and the actual memory configuration. In this paper, we analyze the Intel KNL system and quantify the impact of the most important factors on the application performance by using a set of applications that are representative of scientific and data-analytics workloads. Our results show that applications with regular memory access benefit from MCDRAM, achieving up to 3× performance when compared to the performance obtained using only DRAM. On the contrary, applications with random memory access pattern are latency-bound and may suffer from performance degradation when using only MCDRAM. For those applications, the use of additional hardware threads may help hide latency and achieve higher aggregated bandwidth when using HBM.

  • 76. Peng, I. B.
    et al.
    Gioiosa, R.
    Kestor, G.
    Vetter, J. S.
    Cicotti, P.
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Characterizing the performance benefit of hybrid memory system for HPC applications2018Ingår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 76, s. 57-69Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Heterogenous memory systems that consist of multiple memory technologies are becoming common in high-performance computing environments. Modern processors and accelerators, such as the Intel Knights Landing (KNL) CPU and NVIDIA Volta GPU, feature small-size high-bandwidth memory near the compute cores and large-size normal-bandwidth memory that is connected off-chip. Theoretically, HBM can provide about four times higher bandwidth than conventional DRAM. However, many factors impact the actual performance improvement that an application can achieve on such system. In this paper, we focus on the Intel KNL system and identify the most important factors on the application performance, including the application memory access pattern, the problem size, the threading level and the actual memory configuration. We use a set of representative applications from both scientific and data-analytics domains. Our results show that applications with regular memory access benefit from MCDRAM, achieving up to three times performance when compared to the performance obtained using only DRAM. On the contrary, applications with irregular memory access pattern are latency-bound and may suffer from performance degradation when using only MCDRAM. Also, we provide memory-centric analysis of four applications, identify their major data objects, correlate their characteristics to the performance improvement on the testbed.

  • 77.
    Peng, I. B.
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    The cost of synchronizing imbalanced processes in message passing systems2015Ingår i: Proceedings - IEEE International Conference on Cluster Computing, ICCC, Institute of Electrical and Electronics Engineers (IEEE), 2015, s. 408-417Konferensbidrag (Refereegranskat)
    Abstract [en]

    Synchronization in message passing systems is achieved by communication among processes. System and architectural noise and different workloads cause processes to be imbalanced and to reach synchronization points at different time. Thus, both communication and imbalance impact the synchronization performance. In this paper, we study the algorithmic properties that allow the communication in synchronization to absorb the initial imbalance among processes. We quantify the imbalance absorption properties of different barrier algorithms using a LogP Monte Carlo simulator. We found that linear and f-way tournament barriers can absorb up to 95% of random exponential imbalance with the standard deviation equal to the communication time for one message. Dissemination, butterfly and pairwise exchange barriers, on the other hand, do not absorb imbalance but can effectively bound the post-barrier imbalance. We identify that synchronization transits from communication-dominated to imbalance-dominated when the standard deviation of imbalance distribution is more than twice the communication time for one message. In our study, f-way tournament barriers provided the best imbalance absorption rate and convenient communication time.

  • 78.
    Peng, I. Bo
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Johlander, A.
    Vaivads, A.
    Khotyaintsev, Y.
    Henri, P.
    Lapenta, G.
    Kinetic structures of quasi-perpendicular shocks in global particle-in-cell simulations2015Ingår i: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 22, nr 9, artikel-id 092109Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We carried out global Particle-in-Cell simulations of the interaction between the solar wind and a magnetosphere to study the kinetic collisionless physics in super-critical quasi-perpendicular shocks. After an initial simulation transient, a collisionless bow shock forms as a result of the interaction of the solar wind and a planet magnetic dipole. The shock ramp has a thickness of approximately one ion skin depth and is followed by a trailing wave train in the shock downstream. At the downstream edge of the bow shock, whistler waves propagate along the magnetic field lines and the presence of electron cyclotron waves has been identified. A small part of the solar wind ion population is specularly reflected by the shock while a larger part is deflected and heated by the shock. Solar wind ions and electrons are heated in the perpendicular directions. Ions are accelerated in the perpendicular direction in the trailing wave train region. This work is an initial effort to study the electron and ion kinetic effects developed near the bow shock in a realistic magnetic field configuration.

  • 79.
    Peng, Ivy Bo
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Gioiosa, Roberto
    Kestor, Gokcen
    Cicotti, Pietro
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    RTHMS: A Tool for Data Placement on Hybrid Memory System2017Ingår i: ACM SIGPLAN NOTICES, ASSOC COMPUTING MACHINERY , 2017, Vol. 52, nr 9, s. 82-91Konferensbidrag (Refereegranskat)
    Abstract [en]

    Traditional scientific and emerging data analytics applications require fast, power-efficient, large, and persistent memories. Combining all these characteristics within a single memory technology is expensive and hence future supercomputers will feature different memory technologies side-by-side. However, it is a complex task to program hybrid-memory systems and to identify the best object-to-memory mapping. We envision that programmers will probably resort to use default configurations that only require minimal interventions on the application code or system settings. In this work, we argue that intelligent, fine-grained data placement can achieve higher performance than default setups. We present an algorithm for data placement on hybrid-memory systems. Our algorithm is based on a set of single-object allocation rules and global data placement decisions. We also present RTHMS, a tool that implements our algorithm and provides recommendations about the object-to-memory mapping. Our experiments on a hybrid memory system, an Intel Knights Landing processor with DRAM and HBM, show that RTHMS is able to achieve higher performance than the default configuration. We believe that RTHMS will be a valuable tool for programmers working on complex hybrid-memory systems.

  • 80.
    Peng, Ivy Bo
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Gioiosa, Roberto
    Kestor, Gokcen
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Preparing HPC Applications for the Exascale Era: A Decoupling Strategy2017Ingår i: 2017 46th International Conference on Parallel Processing (ICPP), Institute of Electrical and Electronics Engineers (IEEE), 2017, artikel-id 8025274Konferensbidrag (Refereegranskat)
    Abstract [en]

    Production-quality parallel applications are often a mixture of diverse operations, such as computation- and communication-intensive, regular and irregular, tightly coupled and loosely linked operations. In conventional construction of parallel applications, each process performs all the operations, which might result inefficient and seriously limit scalability, especially at large scale. We propose a decoupling strategy to improve the scalability of applications running on large-scale systems. Our strategy separates application operations onto groups of processes and enables a dataflow processing paradigm among the groups. This mechanism is effective in reducing the impact of load imbalance and increases the parallel efficiency by pipelining multiple operations. We provide a proof-of-concept implementation using MPI, the de-facto programming system on current supercomputers. We demonstrate the effectiveness of this strategy by decoupling the reduce, particle communication, halo exchange and I/O operations in a set of scientific and data-analytics applications. A performance evaluation on 8,192 processes of a Cray XC40 supercomputer shows that the proposed approach can achieve up to 4x performance improvement.

  • 81. Peng, Ivy Bo
    et al.
    Markidis, Stefano
    Gioiosa, Roberto
    Kestor, Gokcen
    Laure, Erwin
    MPI Streams for HPC Applications2017Ingår i: New Frontiers in High Performance Computing and Big Data, IEEE, 2017Kapitel i bok, del av antologi (Refereegranskat)
  • 82.
    Peng, Ivy Bo
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Gioiosa, Roberto
    Pacific Northwest Natl Lab, Computat Sci & Math Div, Richland, WA 99352 USA..
    Kestor, Gokcen
    Pacific Northwest Natl Lab, Computat Sci & Math Div, Richland, WA 99352 USA..
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    MPI Streams for HPC Applications2017Ingår i: NEW FRONTIERS IN HIGH PERFORMANCE COMPUTING AND BIG DATA / [ed] Fox, G Getov, V Grandinetti, L Joubert, G Sterling, T, IOS PRESS , 2017, s. 75-92Konferensbidrag (Refereegranskat)
    Abstract [en]

    Data streams are a sequence of data flowing between source and destination processes. Streaming is widely used for signal, image and video processing for its efficiency in pipelining and effectiveness in reducing demand for memory. The goal of this work is to extend the use of data streams to support both conventional scientific applications and emerging data analytics applications running on HPC platforms. We introduce an extension called MPIStream to the de-facto programming standard on HPC, MPI. MPIStream supports data streams either within a single application or among multiple applications. We present three use cases using MPI streams in HPC applications together with their parallel performance. We show the convenience of using MPI streams to support the needs from both traditional HPC and emerging data analytics applications running on supercomputers.

  • 83.
    Peng, Ivy Bo
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Holmes, D.
    Bull, M.
    A Data streaming model in MPI2015Ingår i: Proceedings of the 3rd ExaMPI Workshop at the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2015, ACM Digital Library, 2015Konferensbidrag (Refereegranskat)
    Abstract [en]

    Data streaming model is an effective way to tackle the chal-lenge of data-intensive applications. As traditional HPC applications generate large volume of data and more data-intensive applications move to HPC infrastructures, it is nec-essary to investigate the feasibility of combining message-passing and streaming programming models. MPI, the de facto standard for programming on HPC, cannot intuitively express the communication pattern and the functional op-erations required in streaming models. In this work, we de-signed and implemented a data streaming library MPIStream atop MPI to allocate data producers and consumers, to stream data continuously or irregularly and to process data at run-Time. In the same spirit as the STREAM benchmark, we developed a parallel stream benchmark to measure data processing rate. The performance of the library largely de-pends on the size of the stream element, the number of data producers and consumers and the computational intensity of processing one stream element. With 2,048 data produc-ers and 2,048 data consumers in the parallel benchmark, MPIStream achieved 200 GB/s processing rate on a Blue Gene/Q supercomputer. We illustrate that a streaming li-brary for HPC applications can effectively enable irregular parallel I/O, application monitoring and threshold collective operations. © 2015 ACM.

  • 84.
    Peng, Ivy Bo
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Vencels, Juris
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Lapenta, Giovanni
    Divin, Andrey
    Vaivads, Andris
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Energetic particles in magnetotail reconnection2015Ingår i: Journal of Plasma Physics, ISSN 0022-3778, E-ISSN 1469-7807, Vol. 81, artikel-id 325810202Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We carried out a 3D fully kinetic simulation of Earth's magnetotail magnetic reconnection to study the dynamics of energetic particles. We developed and implemented a new relativistic particle mover in iPIC3D, an implicit Particle-in-Cell code, to correctly model the dynamics of energetic particles. Before the onset of magnetic reconnection, energetic electrons are found localized close to current sheet and accelerated by lower hybrid drift instability. During magnetic reconnection, energetic particles are found in the reconnection region along the x-line and in the separatrices region. The energetic electrons are first present in localized stripes of the separatrices and finally cover all the separatrix surfaces. Along the separatrices, regions with strong electron deceleration are found. In the reconnection region, two categories of electron trajectory are identified. First, part of the electrons are trapped in the reconnection region, bouncing a few times between the outflow jets. Second, part of the electrons pass the reconnection region without being trapped. Different from electrons, energetic ions are localized on the reconnection fronts of the outflow jets.

  • 85. Restante, A. L.
    et al.
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Lapenta, G.
    Intrator, T.
    Geometrical investigation of the kinetic evolution of the magnetic field in a periodic flux rope2013Ingår i: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 20, nr 8, s. 082501-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Flux ropes are bundles of magnetic field wrapped around an axis. Many laboratory, space, and astrophysics processes can be represented using this idealized concept. Here, a massively parallel 3D kinetic simulation of a periodic flux rope undergoing the kink instability is studied. The focus is on the topology of the magnetic field and its geometric structures. The analysis considers various techniques such as Poincare maps and the quasi-separatrix layer (QSL). These are used to highlight regions with expansion or compression and changes in the connectivity of magnetic field lines and consequently to outline regions where heating and current may be generated due to magnetic reconnection. The present study is, to our knowledge, the first QSL analysis of a fully kinetic 3D particle in cell simulation and focuses the existing QSL method of analysis to periodic systems.

  • 86.
    Rivas Gomez, Sergio
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Brabazon, K.
    Perks, O.
    Narasimhamurthy, S.
    Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks2019Ingår i: Proceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, Institute of Electrical and Electronics Engineers (IEEE), 2019, s. 921-927Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this work, we consider the integration of MPI one-sided communication and non-blocking I/O in HPC-centric MapReduce frameworks. Using a decoupled strategy, we aim to overlap the Map and Reduce phases of the algorithm by allowing processes to communicate and synchronize using solely one-sided operations. Hence, we effectively increase the performance in situations where the workload per process becomes unexpectedly unbalanced. Using a Word-Count implementation and a large dataset from the Purdue MapReduce Benchmarks Suite (PUMA), we demonstrate that our approach can provide up to 23% performance improvement on average compared to a reference MapReduce implementation that uses state-of-the-art MPI collective communication and I/O.

  • 87.
    Rivas-Gomez, Sergio
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Gioiosa, Roberto
    Oak Ridge Natl Lab, Oak Ridge, TN 37830 USA..
    Peng, Ivy Bo
    Oak Ridge Natl Lab, Oak Ridge, TN 37830 USA..
    Kestor, Gokcen
    Oak Ridge Natl Lab, Oak Ridge, TN 37830 USA..
    Narasimhamurthy, Sai
    Seagate Syst UK, Havant PO9 1SA, England..
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    MPI windows on storage for HPC applications2018Ingår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 77, s. 38-56Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Upcoming HPC clusters will feature hybrid memories and storage devices per compute node. In this work, we propose to use the MPI one-sided communication model and MPI windows as unique interface for programming memory and storage. We describe the design and implementation of MPI storage windows, and present its benefits for out-of-core execution, parallel I/O and fault-tolerance. In addition, we explore the integration of heterogeneous window allocations, where memory and storage share a unified virtual address space. When performing large, irregular memory operations, we verify that MPI windows on local storage incurs a 55% performance penalty on average. When using a Lustre parallel file system, "asymmetric" performance is observed with over 90% degradation in writing operations. Nonetheless, experimental results of a Distributed Hash Table, the HACC I/O kernel mini-application, and a novel MapReduce implementation based on the use of MPI one-sided communication, indicate that the overall penalty of MPI windows on storage can be negligible in most cases in real-world applications.

  • 88.
    Rivas-Gomez, Sergio
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Peng, Ivy Bo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Laure, E.
    Kestor, G.
    Gioiosa, R.
    Extending message passing interface windows to storage2017Ingår i: Proceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017, Institute of Electrical and Electronics Engineers Inc. , 2017, s. 728-730Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper presents an extension to MPI supporting the one-sided communication model and window allocations in storage. Our design transparently integrates with the current MPI implementations, enabling applications to target MPI windows in storage, memory or both simultaneously, without major modifications. Initial performance results demonstrate that the presented MPI window extension could potentially be helpful for a wide-range of use-cases and with low-overhead.

  • 89.
    Rivas-Gomez, Sergio
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Pena, A. J.
    Moloney, D.
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Exploring the vision processing unit as co-processor for inference2018Ingår i: Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018, Institute of Electrical and Electronics Engineers (IEEE), 2018, s. 589-598, artikel-id 8425465Konferensbidrag (Refereegranskat)
    Abstract [en]

    The success of the exascale supercomputer is largely debated to remain dependent on novel breakthroughs in technology that effectively reduce the power consumption and thermal dissipation requirements. In this work, we consider the integration of co-processors in high-performance computing (HPC) to enable low-power, seamless computation offloading of certain operations. In particular, we explore the so-called Vision Processing Unit (VPU), a highly-parallel vector processor with a power envelope of less than 1W. We evaluate this chip during inference using a pre-trained GoogLeNet convolutional network model and a large image dataset from the ImageNet ILSVRC challenge. Preliminary results indicate that a multi-VPU configuration provides similar performance compared to reference CPU and GPU implementations, while reducing the thermal-design power (TDP) up to 8x in comparison.

  • 90.
    Simmendinger, Christian
    et al.
    T Syst Solut Res, Stuttgart, Germany..
    Iakymchuk, Roman
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Cebamanos, Luis
    Univ Edinburgh, EPCC, Edinburgh, Midlothian, Scotland..
    Akhmetova, Dana
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Bartsch, Valeria
    Fraunhofer ITWM, HPC Dept, Kaiserslautern, Germany..
    Rotaru, Tiberiu
    Fraunhofer ITWM, Kaiserslautern, Germany..
    Rahn, Mirko
    Fraunhofer ITWM, HPC Dept, Kaiserslautern, Germany..
    Laure, Erwin
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. KTH Royal Inst Technol, High Performance Comp, Stockholm, Sweden.;KTH Royal Inst Technol, PDC Ctr, High Performance Comp Ctr, Stockholm, Sweden..
    Markidis, Stefano
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST). KTH Royal Inst Technol, High Performance Comp, Stockholm, Sweden..
    Interoperability strategies for GASPI and MPI in large-scale scientific applications2019Ingår i: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 33, nr 3, s. 554-568Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    One of the main hurdles of partitioned global address space (PGAS) approaches is the dominance of message passing interface (MPI), which as a de facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like global address space programming interface (GASPI) without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we consider an interoperable GASPI/MPI implementation for the communication/performance crucial parts of the Ludwig and iPIC3D applications. To address the discovered performance limitations, we develop a novel strategy for significantly improved performance and interoperability between both APIs by leveraging GASPI shared windows and shared notifications. First results with a corresponding implementation in the MiniGhost proxy application and the Allreduce collective operation demonstrate the viability of this approach.

  • 91. Stasiewicz, K.
    et al.
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Eliasson, B.
    Strumik, M.
    Yamauchi, M.
    Acceleration of solar wind ions to 1 MeV by electromagnetic structures upstream of the Earth's bow shock2013Ingår i: Europhysics letters, ISSN 0295-5075, E-ISSN 1286-4854, Vol. 102, nr 4, s. 49001-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present measurements from the ESA/NASA Cluster mission that show in situ acceleration of ions to energies of 1MeV outside the bow shock. The observed heating can be associated with the presence of electromagnetic structures with strong spatial gradients of the electric field that lead to ion gyro-phase breaking and to the onset of chaos in ion trajectories. It results in rapid, stochastic acceleration of ions in the direction perpendicular to the ambient magnetic field. The electric potential of the structures can be compared to a field of moguls on a ski slope, capable of accelerating and ejecting the fast running skiers out of piste. This mechanism may represent the universal mechanism for perpendicular acceleration and heating of ions in the magnetosphere, the solar corona and in astrophysical plasmas. This is also a basic mechanism that can limit steepening of nonlinear electromagnetic structures at shocks and foreshocks in collisionless plasmas.

  • 92.
    Thoman, Peter
    et al.
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Dichev, Kiril
    Queens Univ Belfast, Belfast BT7 1NN, Antrim, North Ireland..
    Heller, Thomas
    Univ Erlangen Nurnberg, D-91058 Erlangen, Germany..
    Iakymchuk, Roman
    KTH.
    Aguilar, Xavier
    KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Hasanov, Khalid
    IBM Ireland, Dublin 15, Ireland..
    Gschwandtner, Philipp
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Lemarinier, Pierre
    IBM Ireland, Dublin 15, Ireland..
    Markidis, Stefano
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH Royal Inst Technol, S-10044 Stockholm, Sweden..
    Jordan, Herbert
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Fahringer, Thomas
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Katrinis, Kostas
    IBM Ireland, Dublin 15, Ireland..
    Laure, Erwin
    KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Nikolopoulos, Dimitrios S.
    Queens Univ Belfast, Belfast BT7 1NN, Antrim, North Ireland..
    A taxonomy of task-based parallel programming technologies for high-performance computing2018Ingår i: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 74, nr 4, s. 1422-1434Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Task-based programming models for shared memory-such as Cilk Plus and OpenMP 3-are well established and documented. However, with the increase in parallel, many-core, and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing (HPC), no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today.

  • 93.
    Thoman, Peter
    et al.
    Univ Innsbruck, Innsbruck, Austria..
    Hasanov, Khalid
    IBM Ireland, Dublin, Ireland..
    Dichev, Kiril
    Queens Univ Belfast, Belfast, Antrim, North Ireland..
    Iakymchuk, Roman
    KTH.
    Aguilar, Xavier
    KTH.
    Gschwandtner, Philipp
    Univ Innsbruck, Innsbruck, Austria..
    Lemarinier, Pierre
    IBM Ireland, Dublin, Ireland..
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Jordan, Herbert
    Univ Innsbruck, Innsbruck, Austria..
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Katrinis, Kostas
    IBM Ireland, Dublin, Ireland..
    Nikolopoulos, Dimitrios S.
    Queens Univ Belfast, Belfast, Antrim, North Ireland..
    Fahringer, Thomas
    Univ Innsbruck, Innsbruck, Austria..
    A Taxonomy of Task-Based Technologies for High-Performance Computing2018Ingår i: PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2017), PT II / [ed] Wyrzykowski, R Dongarra, J Deelman, E Karczewski, K, SPRINGER INTERNATIONAL PUBLISHING AG , 2018, s. 264-274Konferensbidrag (Refereegranskat)
    Abstract [en]

    Task-based programming models for shared memory - such as Cilk Plus and OpenMP 3 - are well established and documented. However, with the increase in heterogeneous, many-core and parallel systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing, no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today.

  • 94. Toth, Gabor
    et al.
    Chen, Yuxi
    Gombosi, Tamas I.
    Cassak, Paul
    Markidis, Stefano
    KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Peng, Ivy Bo
    KTH.
    Scaling the Ion Inertial Length and Its Implications for Modeling Reconnection in Global Simulations2017Ingår i: Journal of Geophysical Research - Space Physics, ISSN 2169-9380, E-ISSN 2169-9402, Vol. 122, nr 10, s. 10336-10355Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We investigate the use of artificially increased ion and electron kinetic scales in global plasma simulations. We argue that as long as the global and ion inertial scales remain well separated, (1) the overall global solution is not strongly sensitive to the value of the ion inertial scale, while (2) the ion inertial scale dynamics will also be similar to the original system, but it occurs at a larger spatial scale, and (3) structures at intermediate scales, such as magnetic islands, grow in a self-similar manner. To investigate the validity and limitations of our scaling hypotheses, we carry out many simulations of a two-dimensional magnetosphere with the magnetohydrodynamics with embedded particle-in-cell (MHD-EPIC) model. The PIC model covers the dayside reconnection site. The simulation results confirm that the hypotheses are true as long as the increased ion inertial length remains less than about 5% of the magnetopause standoff distance. Since the theoretical arguments are general, we expect these results to carry over to three dimensions. The computational cost is reduced by the third and fourth powers of the scaling factor in two-and three-dimensional simulations, respectively, which can be many orders of magnitude. The present results suggest that global simulations that resolve kinetic scales for reconnection are feasible. This is a crucial step for applications to the magnetospheres of Earth, Saturn, and Jupiter and to the solar corona.

  • 95. Toth, Gabor
    et al.
    Jia, Xianzhe
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Peng, Ivy Bo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Chen, Yuxi
    Daldorff, Lars K. S.
    Tenishev, Valeriy M.
    Borovikov, Dmitry
    Haiducek, John D.
    Gombosi, Tamas I.
    Glocer, Alex
    Dorelli, John C.
    Extended magnetohydrodynamics with embedded particle-in-cell simulation of Ganymede's magnetosphere2016Ingår i: Journal of Geophysical Research - Space Physics, ISSN 2169-9380, E-ISSN 2169-9402, Vol. 121, nr 2, s. 1273-1293Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We have recently developed a new modeling capability to embed the implicit particle-in-cell (PIC) model iPIC3D into the Block-Adaptive-Tree-Solarwind-Roe-Upwind-Scheme magnetohydrodynamic (MHD) model. The MHD with embedded PIC domains (MHD-EPIC) algorithm is a two-way coupled kinetic-fluid model. As one of the very first applications of the MHD-EPIC algorithm, we simulate the interaction between Jupiter's magnetospheric plasma and Ganymede's magnetosphere. We compare the MHD-EPIC simulations with pure Hall MHD simulations and compare both model results with Galileo observations to assess the importance of kinetic effects in controlling the configuration and dynamics of Ganymede's magnetosphere. We find that the Hall MHD and MHD-EPIC solutions are qualitatively similar, but there are significant quantitative differences. In particular, the density and pressure inside the magnetosphere show different distributions. For our baseline grid resolution the PIC solution is more dynamic than the Hall MHD simulation and it compares significantly better with the Galileo magnetic measurements than the Hall MHD solution. The power spectra of the observed and simulated magnetic field fluctuations agree extremely well for the MHD-EPIC model. The MHD-EPIC simulation also produced a few flux transfer events (FTEs) that have magnetic signatures very similar to an observed event. The simulation shows that the FTEs often exhibit complex 3-D structures with their orientations changing substantially between the equatorial plane and the Galileo trajectory, which explains the magnetic signatures observed during the magnetopause crossings. The computational cost of the MHD-EPIC simulation was only about 4 times more than that of the Hall MHD simulation. Key Points

  • 96. Vapirev, A.
    et al.
    Deca, J.
    Lapenta, G.
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Hur, I.
    Cambier, J. -L
    Initial results on computational performance of Intel many integrated core, sandy bridge, and graphical processing unit architectures: implementation of a 1D c++/OpenMP electrostatic particle-in-cell code2015Ingår i: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 27, nr 3, s. 581-593Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present initial comparison performance results for Intel many integrated core (MIC), Sandy Bridge (SB), and graphical processing unit (GPU). A 1D explicit electrostatic particle-in-cell code is used to simulate a two-stream instability in plasma. We compare the computation times for various number of cores/threads and compiler options. The parallelization is implemented via OpenMP with a maximum thread number of 128. Parallelization and vectorization on the GPU is achieved with modifying the code syntax for compatibility with CUDA. We assess the speedup due to various auto-vectorization and optimization level compiler options. Our results show that the MIC is several times slower than SB for a single thread, and it becomes faster than SB when the number of cores increases with vectorization switched on. The compute times for the GPU are consistently about six to seven times faster than the ones for MIC. Compared with SB, the GPU is about two times faster for a single thread and about an order of magnitude faster for 128 threads. The net speedup, however, for MIC and GPU are almost the same. An initial attempt to offload parts of the code to the MIC coprocessor shows that there is an optimal number of threads where the speedup reaches a maximum.

  • 97. Vapirev, A. E.
    et al.
    Lapenta, G.
    Divin, A.
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Henri, P.
    Goldman, M.
    Newman, D.
    Formation of a transient front structure near reconnection point in 3-D PIC simulations2013Ingår i: Journal of Geophysical Research, ISSN 0148-0227, E-ISSN 2156-2202, Vol. 118, nr 4, s. 1435-1449Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Massively parallel numerical simulations of magnetic reconnection are presented in this study. Electromagnetic full-particle implicit code iPIC3D is used to study the dynamics and 3-D evolution of reconnection outflows. Such features as Hall magnetic field, inflow and outflow, and diffusion region formation are very similar to 2-D particle-in-cell (PIC) simulations. In addition, it is well known that instabilities develop in the current flow direction or oblique directions. These modes could provide for anomalous resistivity and diffusive drag and can serve as additional proxies for magnetic reconnection. In our work, the unstable evolution of reconnection transient front structures is studied. Reconnection configuration in the absence of guide field is considered, and it is initialized with a localized perturbation aligned in the cross-tail direction. Our study suggests that the instabilities lead to the development of finger-like density structures on ion-electron hybrid scales. These structures are characterized by a rapid increase of the magnetic field, normal to the current sheet (Bz). A small decrease in the magnetic field component parallel to the reconnection X line and the component perpendicular to the current sheet is observed in the region ahead of the front. The instabilities form due to fact that the density gradient inside the front region is opposite to the direction of the acceleration Lorentz force. Such density structures may possibly further develop into larger-scale earthward flux transfer events during magnetotail reconnection. In addition, oscillations mainly in the magnetic and electric fields and the electron density are observed shortly before the arrival of the main front structure which is consistent with recent THEMIS observations. Key PointsThree dimensional particle-in-cell simulation of reconnection in the magnetotailEvolution of dipolarization front at reconnection and associated plasma flowDevelopment of instabilities in the plasma population

  • 98.
    Vencels, Juris
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Delzanno, G. L.
    Johnson, A.
    Peng, I. Bo
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Spectral solver for multi-scale plasma physics simulations with dynamically adaptive number of moments2015Ingår i: Procedia Computer Science, Elsevier, 2015, nr 1, s. 1148-1157Konferensbidrag (Refereegranskat)
    Abstract [en]

    A spectral method for kinetic plasma simulations based on the expansion of the velocity distribution function in a variable number of Hermite polynomials is presented. The method is based on a set of non-linear equations that is solved to determine the coefficients of the Hermite expansion satisfying the Vlasov and Poisson equations. In this paper, we first show that this technique combines the fluid and kinetic approaches into one framework. Second, we present an adaptive strategy to increase and decrease the number of Hermite functions dynamically during the simulation. The technique is applied to the Landau damping and two-stream instability test problems. Performance results show 21% and 47% saving of total simulation time in the Landau and two-stream instability test cases, respectively.

  • 99.
    Yu, Yiqun
    et al.
    Beihang Univ, Sch Space & Environm, Beijing, Peoples R China..
    Delzanno, Gian Luca
    Los Alamos Natl Lab, Los Alamos, NM USA..
    Jordanova, Vania
    Los Alamos Natl Lab, Los Alamos, NM USA..
    Peng, Ivy Bo
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    PIC simulations of wave-particle interactions with an initial electron velocity distribution from a kinetic ring current model2018Ingår i: Journal of Atmospheric and Solar-Terrestrial Physics, ISSN 1364-6826, E-ISSN 1879-1824, Vol. 177, s. 169-178Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Whistler wave-particle interactions play an important role in the Earth inner magnetospheric dynamics and have been the subject of numerous investigations. By running a global kinetic ring current model (RAM-SCB) in a storm event occurred on Oct 23-24 2002, we obtain the ring current electron distribution at a selected location at MLT of 9 and L of 6 where the electron distribution is composed of a warm population in the form of a partial ring in the velocity space (with energy around 15 keV) in addition to a cool population with a Maxwellian-like distribution. The warm population is likely from the injected plasma sheet electrons during substorm injections that supply fresh source to the inner magnetosphere. These electron distributions are then used as input in an implicit particle-in-cell code (iPIC3D) to study whistler-wave generation and the subsequent wave-particle interactions. We find that whistler waves are excited and propagate in the quasi-parallel direction along the background magnetic field. Several different wave modes are instantaneously generated with different growth rates and frequencies. The wave mode at the maximum growth rate has a frequency around 0.62 omega(ce), which corresponds to a parallel resonant energy of 2.5 keV. Linear theory analysis of wave growth is in excellent agreement with the simulation results. These waves grow initially due to the injected warm electrons and are later damped due to cyclotron absorption by electrons whose energy is close to the resonant energy and can effectively attenuate waves. The warm electron population overall experiences net energy loss and anisotropy drop while moving along the diffusion surfaces towards regions of lower phase space density, while the cool electron population undergoes heating when the waves grow, suggesting the cross-population interactions.

12 51 - 99 av 99
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf