kth.sePublications
Change search
Refine search result
1234 1 - 50 of 178
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Afzal, Ayesha
    et al.
    Erlangen Natl High Performance Comp Ctr NHR FAU, D-91058 Erlangen, Germany.;Friedrich Alexander Univ Erlangen Nurnberg, Dept Comp Sci, D-91058 Erlangen, Germany..
    Hager, Georg
    Erlangen Natl High Performance Comp Ctr NHR FAU, D-91058 Erlangen, Germany..
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Wellein, Gerhard
    Erlangen Natl High Performance Comp Ctr NHR FAU, D-91058 Erlangen, Germany.;Friedrich Alexander Univ Erlangen Nurnberg, Dept Comp Sci, D-91058 Erlangen, Germany..
    Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives2023In: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 148, p. 472-487Article in journal (Refereed)
    Abstract [en]

    Comprehending the performance bottlenecks at the core of the intricate hardware-software inter-actions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI communication in memory-bound parallel programs on multicore clusters and how it can be facilitated. For instance, slowing down MPI processes by deliberate injection of delays can improve performance if certain conditions are met. This leads to the counter-intuitive conclusion that noise, independent of its source, is not always detrimental but can be leveraged for performance improvements. We employ phase-space graphs as a new tool to visualize parallel program dynamics. They are useful in spotting certain patterns in parallel execution that will easily go unnoticed with traditional tracing tools. We investigate five different microbenchmarks and applications on different supercomputer platforms: an MPI-augmented STREAM Triad, two implementations of Lattice-Boltzmann fluid solvers (D3Q19 and SPEChpc D2Q37), the LULESH and HPCG proxy applications.

  • 2.
    Afzal, Ayesha
    et al.
    Erlangen National High Performance Computing Center (NHR@FAU), 91058, Erlangen, Germany.
    Hager, Georg
    Erlangen National High Performance Computing Center (NHR@FAU), 91058, Erlangen, Germany.
    Wellein, Gerhard
    Erlangen National High Performance Computing Center (NHR@FAU), 91058, Erlangen, Germany; Department of Computer Science, University of Erlangen-Nürnberg, 91058, Erlangen, Germany.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications2023In: Parallel Processing and Applied Mathematics - 14th International Conference, PPAM 2022, Revised Selected Papers, Springer Nature , 2023, p. 155-170Conference paper (Refereed)
    Abstract [en]

    This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time per time step as relevant observables. Using principal component analysis, clustering techniques, correlation functions, and a new “phase space plot,” we show how desynchronization patterns (or lack thereof) can be readily identified from a data set that is much smaller than a full MPI trace. Our methods also lead the way towards a more general classification of parallel program dynamics.

  • 3.
    Aguilar, Xavier
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    A Deep Learning-Based Particle-in-Cell Method for Plasma Simulations2021In: 2021 IEEE International Conference On Cluster Computing (CLUSTER 2021), Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 692-697Conference paper (Refereed)
    Abstract [en]

    We design and develop a new Particle-in-Cell (PIC) method for plasma simulations using Deep-Learning (DL) to calculate the electric field from the electron phase space. We train a Multilayer Perceptron (MLP) and a Convolutional Neural Network (CNN) to solve the two-stream instability test. We verify that the DL-based MLP PIC method produces the correct results using the two-stream instability: the DL-based PIC provides the expected growth rate of the two-stream instability. The DL-based PIC does not conserve the total energy and momentum. However, the DL-based PIC method is stable against the cold-beam instability, affecting traditional PIC methods. This work shows that integrating DL technologies into traditional computational methods is a viable approach for developing next-generation PIC algorithms.

  • 4. Akhmetova, D.
    et al.
    Kestor, G.
    Gioiosa, R.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    On the application task granularity and the interplay with the scheduling overhead in many-core shared memory systems2015In: Proceedings - IEEE International Conference on Cluster Computing, ICCC, IEEE , 2015, p. 428-437Conference paper (Refereed)
    Abstract [en]

    Task-based programming models are considered one of the most promising programming model approaches for exascale supercomputers because of their ability to dynamically react to changing conditions and reassign work to processing elements. One question, however, remains unsolved: what should the task granularity of task-based applications be? Fine-grained tasks offer more opportunities to balance the system and generally result in higher system utilization. However, they also induce in large scheduling overhead. The impact of scheduling overhead on coarse-grained tasks is lower, but large systems may result imbalanced and underutilized. In this work we propose a methodology to analyze the interplay between application task granularity and scheduling overhead. Our methodology is based on three main points: 1) a novel task algorithm that analyzes an application directed acyclic graph (DAG) and aggregates tasks, 2) a fast and precise emulator to analyze the application behavior on systems with up to 1,024 cores, 3) a comprehensive sensitivity analysis of application performance and scheduling overhead breakdown. Our results show that there is an optimal task granularity between 1.2x10^4 and 10x10^4 cycles for the representative schedulers. Moreover, our analysis indicates that a suitable scheduler for exascale task-based applications should employ a best-effort local scheduler and a sophisticated remote scheduler to move tasks across worker threads.

  • 5.
    Akhmetova, Dana
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Cebamanos, L.
    Iakymchuk, Roman
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Rotaru, T.
    Rahn, M.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Bartsch, V.
    Simmendinger, C.
    Interoperability of GASPI and MPI in large scale scientific applications2018In: 12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017, Springer Verlag , 2018, p. 277-287Conference paper (Refereed)
    Abstract [en]

    One of the main hurdles of a broad distribution of PGAS approaches is the prevalence of MPI, which as a de-facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like GASPI without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we address this challenge by providing our study and preliminary performance results regarding interoperating GASPI and MPI on the performance crucial parts of the Ludwig and iPIC3D applications. In addition, we draw a strategy for better coupling of both APIs. 

  • 6.
    Al Ahad, Muhammed Abdullah
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Simmendinger, Christian
    T Syst Solut Res GmbH, D-70563 Stuttgart, Germany..
    Iakymchuk, Roman
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Efficient Algorithms for Collective Operations with Notified Communication in Shared Windows2018In: PROCEEDINGS OF PAW-ATM18: 2018 IEEE/ACM PARALLEL APPLICATIONS WORKSHOP, ALTERNATIVES TO MPI (PAW-ATM), IEEE , 2018, p. 1-10Conference paper (Refereed)
    Abstract [en]

    Collective operations are commonly used in various parts of scientific applications. Especially in strong scaling scenarios collective operations can negatively impact the overall applications performance: while the load per rank here decreases with increasing core counts, time spent in e.g. barrier operations will increase logarithmically with the core count. In this article, we develop novel algorithmic solutions for collective operations such as Allreduce and Allgather(V)-by leveraging notified communication in shared windows. To this end, we have developed an extension of GASPI which enables all ranks participating in a shared window to observe the entire notified communication targeted at the window. By exploring benefits of this extension, we deliver high performing implementations of Allreduce and Allgather(V) on Intel and Cray clusters. These implementations clearly achieve 2x-4x performance improvements compared to the best performing MPI implementations for various data distributions.

  • 7.
    Andersson, Måns I.
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Liu, Felix
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Anderson Accelerated PMHSS for Complex-Symmetric Linear Systems2024In: Proceedings of the 2024 SIAM Conference on Parallel Processing for Scientific Computing (PP), Society for Industrial & Applied Mathematics (SIAM) , 2024, p. 39-52Conference paper (Refereed)
    Abstract [en]

    Abstract This paper presents the design and development of an Anderson Accelerated Preconditioned Modified Hermitian and Skew-Hermitian Splitting (AA-PMHSS) method for solving complex-symmetric linear systems with application to electromagnetics problems, such as wave scattering and eddy currents. While it has been shown that the Anderson acceleration of real linear systems is essentially equivalent to GMRES, we show here that the formulation using Anderson acceleration leads to a more performant method. We show relatively good robustness compared to existing preconditioned GMRES methods and significantly better performance due to the faster evaluation of the preconditioner. In particular, AA-PMHSS can be applied to solve problems and equations arising from complex-valued systems, such as time-harmonic eddy current simulations discretized with the Finite Element Method. We also evaluate three test systems present in previous literature. We show that the method is competitive with two types of preconditioned GMRES, which share the significant advantage of having a convergence rate that is independent of the discretization size.

  • 8.
    Andersson, Måns
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    A Case Study on DaCe Portability & Performance for Batched Discrete Fourier Transforms2023In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region: 2023, Association for Computing Machinery (ACM) , 2023Conference paper (Refereed)
    Abstract [en]

    With the emergence of new computer architectures, portability and performance-portability become significant concerns for developing HPC applications. This work reports our experience and lessons learned using DaCe to create and optimize batched Discrete Fourier Transform (DFT) calculations on different single node computer systems. The batched DFT calculation is an essential component in FFT algorithms and is widely used in computer science, numerical analysis, and signal processing. We implement the batched DFT with three complex-value array data layouts and compare them with the native complex type implementation. We use DaCe, which relies on Stateful DataFlow multiGraphs (SDFG) as an intermediate representation (IR) which can be optimized through transforms and then generates code for different architectures. We present several performance results showcasing the potential of DaCe for expressing HPC applications on different computer systems.

  • 9.
    Andersson, Måns
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Natarajan Arul, Murugan
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software2023In: PPAM 2022. Lecture Notes in Computer Science, vol 13826., Springer Nature , 2023, p. 333-345Conference paper (Refereed)
    Abstract [en]

    GROMACS is one of the most widely used HPC software packages using the Molecular Dynamics (MD) simulation technique. In this work, we quantify GROMACS parallel performance using different configurations, HPC systems, and FFT libraries (FFTW, Intel MKL FFT, and FFT PACK). We break down the cost of each GROMACS computational phase and identify non-scalable stages, such as MPI communication during the 3D FFT computation when using a large number of processes. We show that the Particle-Mesh Ewald phase and the 3D FFT calculation significantly impact the GROMACS performance. Finally, we discuss performance opportunities with a particular interest in developing GROMACS for the FFT calculations.

  • 10.
    Araújo De Medeiros, Daniel
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Peng, Ivy Bo
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    LibCOS: Enabling Converged HPC and Cloud Data Stores with MPI2023In: Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2023, Association for Computing Machinery (ACM) , 2023, p. 106-116Conference paper (Refereed)
    Abstract [en]

    Recently, federated HPC and cloud resources are becoming increasingly strategic for providing diversified and geographically available computing resources. However, accessing data stores across HPC and cloud storage systems is challenging. Many cloud providers use object storage systems to support their clients in storing and retrieving data over the internet. One popular method is REST APIs atop the HTTP protocol, with Amazon's S3 APIs being supported by most vendors. In contrast, HPC systems are contained within their networks and tend to use parallel file systems with POSIX-like interfaces. This work addresses the challenge of diverse data stores on HPC and cloud systems by providing native object storage support through the unified MPI I/O interface in HPC applications. In particular, we provide a prototype library called LibCOS that transparently enables MPI applications running on HPC systems to access object storage on remote cloud systems. We evaluated LibCOS on a Ceph object storage system and a traditional HPC system. In addition, we conducted performance characterization of core S3 operations that enable individual and collective MPI I/O. Our evaluation in HACC, IOR, and BigSort shows that enabling diverse data stores on HPC and Cloud storage is feasible and can be transparently achieved through the widely adopted MPI I/O. Also, we show that a native object storage system like Ceph could improve the scalability of I/O operations in parallel applications.

  • 11.
    Atzori, Marco
    et al.
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics.
    Köpp, Wiebke
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Chien, Wei Der
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Massaro, Daniele
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics.
    Mallor, Fermin
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics.
    Peplinski, Adam
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Rezaei, Mohamad
    PDC Center for High Performance Computing, KTH Royal Institute of Technology.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Vinuesa, Ricardo
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Weinkauf, Tino
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    In-situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst2021Report (Other academic)
    Abstract [en]

    In-situ visualization on HPC systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We design and develop in-situ visualization with Paraview Catalyst in Nek5000, a massively parallel Fortran and C code for computational fluid dynamics applications. We perform strong scalability tests up to 2,048 cores on KTH's Beskow Cray XC40 supercomputer and assess in-situ visualization's impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in-situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ~21\% on 2,048 cores (the relative efficiency of Nek5000 without in-situ operations is ~99\%). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in-situ processing time between rank 0 and all other ranks. Better scaling and load-balancing in the parallel image composition would considerably improve the performance and scalability of Nek5000 with in-situ capabilities in large-scale simulation.

  • 12.
    Atzori, Marco
    et al.
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW.
    Köpp, Wiebke
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Chien, Wei Der
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Massaro, Daniele
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics.
    Mallor, Fermin
    KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW.
    Peplinski, Adam
    KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Rezaei, Mohammadtaghi
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Vinuesa, Ricardo
    KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW.
    Laure, E.
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW.
    Weinkauf, Tino
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    In situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst2022In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 78, no 3, p. 3605-3620Article in journal (Refereed)
    Abstract [en]

    In situ visualization on high-performance computing systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We develop an in situ adaptor for Paraview Catalyst and Nek5000, a massively parallel Fortran and C code for computational fluid dynamics. We perform a strong scalability test up to 2048 cores on KTH’s Beskow Cray XC40 supercomputer and assess in situ visualization’s impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ≈ 21 % on 2048 cores (the relative efficiency of Nek5000 without in situ operations is ≈ 99 %). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses the Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in situ processing time between rank 0 and all other ranks. In our case, better scaling and load-balancing in the parallel image composition would considerably improve the performance of Nek5000 with in situ capabilities. In general, the result of this study highlights the technical challenges posed by the integration of high-performance simulation codes and data-analysis libraries and their practical use in complex cases, even when efficient algorithms already exist for a certain application scenario.

  • 13. Beck, A.
    et al.
    Innocenti, M. E.
    Lapenta, G.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Multi-level multi-domain algorithm implementation for two-dimensional multiscale particle in cell simulations2014In: Journal of Computational Physics, ISSN 0021-9991, E-ISSN 1090-2716, Vol. 271, p. 430-443Article in journal (Refereed)
    Abstract [en]

    There are a number of modeling challenges posed by space weather simulations. Most of them arise from the multiscale and multiphysics aspects of the problem. The multiple scales dramatically increase the requirements, in terms of computational resources, because of the need of performing large scale simulations with the proper small-scales resolution. Lately, several suggestions have been made to overcome this difficulty by using various refinement methods which consist in splitting the domain into regions of different resolutions separated by well defined interfaces. The multiphysics issues are generally treated in a similar way: interfaces separate the regions where different equations are solved. This paper presents an innovative approach based on the coexistence of several levels of description, which differ by their resolutions or, potentially, by their physics. Instead of interacting through interfaces, these levels are entirely simulated and are interlocked over the complete extension of the overlap area. This scheme has been applied to a parallelized, two-dimensional, Implicit Moment Method Particle in Cell code in order to investigate its multiscale description capabilities. Simulations of magnetic reconnection and plasma expansion in vacuum are presented and possible implementation options for this scheme on very large systems are also discussed.

  • 14.
    Bragone, Federica
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Rosén, Tomas
    KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Fibre- and Polymer Technology. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Centres, Wallenberg Wood Science Center.
    Morozovska, Kateryna
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, School of Industrial Engineering and Management (ITM), Industrial Economics and Management (Dept.), Sustainability, Industrial Dynamics & Entrepreneurship.
    Laneryd, Tor
    Hitachi Energy, Västerås, Sweden.
    Söderberg, Daniel
    KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Centres, Wallenberg Wood Science Center. KTH, School of Engineering Sciences (SCI), Engineering Mechanics. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Fibre- and Polymer Technology, Fiberprocesser.
    Markidis, Stefano
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Unsupervised Learning Analysis of Flow-Induced Birefringence in Nanocellulose: Differentiating Materials and ConcentrationsManuscript (preprint) (Other academic)
    Abstract [en]

    Cellulose nanofibrils (CNFs) can be used as building blocks for future sustainable materials including strong and stiff filaments. The goal of this paper is to introduce a data analysis of flow-induced birefringence experiments by means of unsupervised learning techniques. By reducing the dimensionality of the data with Principal Component Analysis (PCA) we are able to exploit information for the different cellulose materials at several concentrations and compare them to each other. Our approach aims at classifying the CNF materials at different concentrations by applying unsupervised machine learning algorithms, like k-means and Gaussian Mixture Models (GMMs). Finally, we analyze the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of the first principal component, detecting seasonality in lower concentrations. The focus is given to the initial relaxation of birefringence after the flow is stopped to have a better understanding of the Brownian dynamics for the given materials and concentrations.

    Our method can be used to distinguish the different materials at specific concentrations and could help to identify possible advantages and drawbacks of one material over the other. 

  • 15.
    Brown, Nick
    et al.
    Univ Edinburgh, EPCC, Edinburgh, Midlothian, Scotland..
    Nash, Rupert
    Univ Edinburgh, EPCC, Edinburgh, Midlothian, Scotland..
    Gibb, Gordon
    Univ Edinburgh, EPCC, Edinburgh, Midlothian, Scotland..
    Belikov, Evgenij
    Univ Edinburgh, EPCC, Edinburgh, Midlothian, Scotland..
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Chien, Wei Der
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Flatken, Markus
    German Aerosp Ctr DLR, Braunschweig, Germany..
    Gerndt, Andreas
    German Aerosp Ctr DLR, Braunschweig, Germany..
    Workflows to Driving High-Performance Interactive Supercomputing for Urgent Decision Making2022In: High Performance Computing, Isc High Performance 2022 International Workshops / [ed] Anzt, H Bienz, A Luszczek, P Baboulin, M, Springer Nature , 2022, Vol. 13387, p. 233-244Conference paper (Refereed)
    Abstract [en]

    Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; namely the users, the simulation codes, and external data sources, together in a structured and accessible manner. In this paper we explore the role of workflows from both the perspective of marshalling and control of urgent workloads, and at the individual HPC machine level. Ultimately requiring two workflow systems, by using a space weather prediction urgent use-cases, we explore the benefit that these two workflow systems provide especially when one exploits the flexibility enabled by them interoperating.

  • 16.
    Brown, Nick
    et al.
    Univ Edinburgh, EPCC, Edinburgh, Midlothian, Scotland..
    Nash, Rupert
    Univ Edinburgh, EPCC, Edinburgh, Midlothian, Scotland..
    Poletti, Piero
    Bruno Kessler Fdn, Trento, Italy..
    Guzzetta, Giorgio
    Bruno Kessler Fdn, Trento, Italy..
    Manica, Mattia
    Bruno Kessler Fdn, Trento, Italy..
    Zardini, Agnese
    Bruno Kessler Fdn, Trento, Italy..
    Flatken, Markus
    German Aerosp Ctr DLR, Braunschweig, Germany..
    Vidal, Jules
    Sorbonne Univ, Paris, France..
    Gueunet, Charles
    Kitware, Lyon, France..
    Belikov, Evgenij
    Univ Edinburgh, EPCC, Edinburgh, Midlothian, Scotland..
    Tierny, Julien
    Sorbonne Univ, Paris, France..
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Chien, Wei Der
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Gerndt, Andreas
    German Aerosp Ctr DLR, Braunschweig, Germany..
    Utilising urgent computing to tackle the spread of mosquito-borne diseases2021In: Proceedings of Urgenthpc 2021: The Third International Workshop On Hpc For Urgent Decision Making, Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 36-44Conference paper (Refereed)
    Abstract [en]

    It is estimated that around 80% of the world's population live in areas susceptible to at-least one major vector borne disease, and approximately 20% of global communicable diseases are spread by mosquitoes. Furthermore, the outbreaks of such diseases are becoming more common and widespread, with much of this driven in recent years by socio-demographic and climatic factors. These trends are causing significant worry to global health organisations, including the CDC and WHO, and-so an important question is the role that technology can play in addressing them. In this work we describe the integration of an epidemiology model, which simulates the spread of mosquito-borne diseases, with the VESTEC urgent computing ecosystem. The intention of this work is to empower human health professionals to exploit this model and more easily explore the progression of mosquito-borne diseases. Traditionally in the domain of the few research scientists, by leveraging state of the art visualisation and analytics techniques, all supported by running the computational workloads on HPC machines in a seamless fashion, we demonstrate the significant advantages that such an integration can provide. Furthermore we demonstrate the benefits of using an ecosystem such as VESTEC, which provides a framework for urgent computing, in supporting the easy adoption of these technologies by the epidemiologists and disaster response professionals more widely.

  • 17. Cazzola, E.
    et al.
    Curreli, D.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Lapenta, G.
    On the ions acceleration via collisionless magnetic reconnection in laboratory plasmas2016In: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 23, no 11, article id 112108Article in journal (Refereed)
    Abstract [en]

    This work presents an analysis of the ion outflow from magnetic reconnection throughout fully kinetic simulations with typical laboratory plasma values. A symmetric initial configuration for the density and magnetic field is considered across the current sheet. After analyzing the behavior of a set of nine simulations with a reduced mass ratio and with a permuted value of three initial electron temperatures and magnetic field intensity, the best ion acceleration scenario is further studied with a realistic mass ratio in terms of the ion dynamics and energy budget. Interestingly, a series of shock wave structures are observed in the outflow, resembling the shock discontinuities found in recent magnetohydrodynamic simulations. An analysis of the ion outflow at several distances from the reconnection point is presented, in light of possible laboratory applications. The analysis suggests that magnetic reconnection could be used as a tool for plasma acceleration, with applications ranging from electric propulsion to production of ion thermal beams.

  • 18. Cazzola, E.
    et al.
    Innocenti, M. E.
    Goldman, M. V.
    Newman, D. L.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Lapenta, G.
    Electrons dynamics in asymmetric magnetic reconnection and rapid island coalescence: Anisotropy and agyrotropywith andwithout a guide field2016In: 43rd European Physical Society Conference on Plasma Physics, EPS 2016, European Physical Society (EPS) , 2016Conference paper (Refereed)
  • 19. Cazzola, E.
    et al.
    Innocenti, M. E.
    Goldman, M. V.
    Newman, D. L.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Lapenta, G.
    On the electron agyrotropy during rapid asymmetric magnetic island coalescence in presence of a guide field2016In: Geophysical Research Letters, ISSN 0094-8276, E-ISSN 1944-8007, Vol. 43, no 15, p. 7840-7849Article in journal (Refereed)
    Abstract [en]

    We present an analysis of the properties of the electron velocity distribution during island coalescence in asymmetric reconnection with and without guide field. In a previous study, three main domains were identified, in the case without guide field, as X, D, and M regions featuring different reconnection evolutions. These regions are also identified here in the case with guide field. We study the departure from isotropic and gyrotropic behavior by means of different robust detection algorithms proposed in the literature. While in the case without guide field these metrics show an overall agreement, when the guide field is present, a discrepancy in the agyrotropy within some relevant regions is observed, such as at the separatrices and inside magnetic islands. Moreover, in light of the new observations from the Multiscale MagnetoSpheric mission, an analysis of the electron velocity phase-space in these domains is presented.

  • 20. Cazzola, E.
    et al.
    Innocenti, M. E.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Goldman, M. V.
    Newman, D. L.
    Lapenta, G.
    On the electron dynamics during island coalescence in asymmetric magnetic reconnection2015In: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 22, no 9, article id 092901Article in journal (Refereed)
    Abstract [en]

    We present an analysis of the electron dynamics during rapid island merging in asymmetric magnetic reconnection. We consider a doubly periodic system with two asymmetric transitions. The upper layer is an asymmetric Harris sheet of finite width perturbed initially to promote a single reconnection site. The lower layer is a tangential discontinuity that promotes the formation of many X-points, separated by rapidly merging islands. Across both layers, the magnetic field and the density have a strong jump, but the pressure is held constant. Our analysis focuses on the consequences of electron energization during island coalescence. We focus first on the parallel and perpendicular components of the electron temperature to establish the presence of possible anisotropies and non-gyrotropies. Thanks to the direct comparison between the two different layers simulated, we can distinguish three main types of behavior characteristic of three different regions of interest. The first type represents the regions where traditional asymmetric reconnections take place without involving island merging. The second type of regions instead shows reconnection events between two merging islands. Finally, the third regions identify the regions between two diverging island and where typical signature of reconnection is not observed. Electrons in these latter regions additionally show a flat-top distribution resulting from the saturation of a two-stream instability generated by the two interacting electron beams from the two nearest reconnection points. Finally, the analysis of agyrotropy shows the presence of a distinct double structure laying all over the lower side facing the higher magnetic field region. This structure becomes quadrupolar in the proximity of the regions of the third type. The distinguishing features found for the three types of regions investigated provide clear indicators to the recently launched Magnetospheric Multiscale NASA mission for investigating magnetopause reconnection involving multiple islands.

  • 21. Chen, Y.
    et al.
    Tóth, G.
    Jia, X.
    Slavin, J. A.
    Sun, W.
    Markidis, Stefano
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Gombosi, T. I.
    Raines, J. M.
    Studying Dawn-Dusk Asymmetries of Mercury's Magnetotail Using MHD-EPIC Simulations2019In: Journal of Geophysical Research - Space Physics, ISSN 2169-9380, E-ISSN 2169-9402, Vol. 124, no 11, p. 8954-8973Article in journal (Refereed)
    Abstract [en]

    MESSENGER has observed a lot of dawn-dusk asymmetries in Mercury's magnetotail, such as the asymmetries of the cross-tail current sheet thickness and the occurrence of flux ropes, dipolarization events, and energetic electron injections. In order to obtain a global pictures of Mercury's magnetotail dynamics and the relationship between these asymmetries, we perform global simulations with the magnetohydrodynamics with embedded particle-in-cell (MHD-EPIC) model, where Mercury's magnetotail region is covered by a PIC code. Our simulations show that the dawnside current sheet is thicker, the plasma density is larger, and the electron pressure is higher than the duskside. Under a strong interplanetary magnetic field driver, the simulated reconnection sites prefer the dawnside. We also found the dipolarization events and the planetward electron jets are moving dawnward while they are moving toward the planet, so that almost all dipolarization events and high-speed plasma flows concentrate in the dawn sector. The simulation results are consistent with MESSENGER observations.

  • 22. Chen, Yuxi
    et al.
    Toth, Gabor
    Cassak, Paul
    Jia, Xianzhe
    Gombosi, Tamas I.
    Slavin, James A.
    Markidis, Stefano
    KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Peng, Ivy Bo
    KTH.
    Jordanova, Vania K.
    Henderson, Michael G.
    Global Three-Dimensional Simulation of Earth's Dayside Reconnection Using a Two-Way Coupled Magnetohydrodynamics With Embedded Particle-in-Cell Model: Initial Results2017In: Journal of Geophysical Research - Space Physics, ISSN 2169-9380, E-ISSN 2169-9402, Vol. 122, no 10, p. 10318-10335Article in journal (Refereed)
    Abstract [en]

    We perform a three-dimensional (3-D) global simulation of Earth's magnetosphere with kinetic reconnection physics to study the flux transfer events (FTEs) and dayside magnetic reconnection with the recently developed magnetohydrodynamics with embedded particle-in-cell model. During the 1 h long simulation, the FTEs are generated quasi-periodically near the subsolar point and move toward the poles. We find that the magnetic field signature of FTEs at their early formation stage is similar to a "crater FTE," which is characterized by a magnetic field strength dip at the FTE center. After the FTE core field grows to a significant value, it becomes an FTE with typical flux rope structure. When an FTE moves across the cusp, reconnection between the FTE field lines and the cusp field lines can dissipate the FTE. The kinetic features are also captured by our model. A crescent electron phase space distribution is found near the reconnection site. A similar distribution is found for ions at the location where the Larmor electric field appears. The lower hybrid drift instability (LHDI) along the current sheet direction also arises at the interface of magnetosheath and magnetosphere plasma. The LHDI electric field is about 8 mV/m, and its dominant wavelength relative to the electron gyroradius agrees reasonably with Magnetospheric Multiscale (MMS) observations.

  • 23.
    Chen, Yuxi
    et al.
    Univ Michigan, Dept Climate & Space Sci & Engn, Ann Arbor, MI 48109 USA..
    Toth, Gabor
    Univ Michigan, Dept Climate & Space Sci & Engn, Ann Arbor, MI 48109 USA..
    Hietala, Heli
    Univ Turku, Dept Phys & Astron, Space Res Lab, Turku, Finland.;Univ Calif Los Angeles, Dept Earth Planetary & Space Sci, Los Angeles, CA USA.;Imperial Coll London, Blackett Lab, London, England..
    Vines, Sarah K.
    Johns Hopkins Univ, Appl Phys Lab, Laurel, MD USA..
    Zou, Ying
    Univ Alabama, Ctr Space Plasma & Aeron Res, Huntsville, AL 35899 USA..
    Nishimura, Yukitoshi
    Boston Univ, Dept Elect & Comp Engn, Boston, MA 02215 USA.;Boston Univ, Ctr Space Phys, Boston, MA 02215 USA..
    Silveira, Marcos V. D.
    NASA, Goddard Space Flight Ctr, Greenbelt, MD USA.;Catholic Univ Amer, Washington, DC 20064 USA..
    Guo, Zhifang
    Auburn Univ, Dept Phys, Auburn, AL 36849 USA..
    Lin, Yu
    Auburn Univ, Dept Phys, Auburn, AL 36849 USA..
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Magnetohydrodynamic With Embedded Particle-In-Cell Simulation of the Geospace Environment Modeling Dayside Kinetic Processes Challenge Event2020In: Earth and Space Science, E-ISSN 2333-5084, Vol. 7, no 11, article id e2020EA001331Article in journal (Refereed)
    Abstract [en]

    We use the magnetohydrodynamic (MHD) with embedded particle-in-cell model (MHD-EPIC) to study the Geospace Environment Modeling (GEM) dayside kinetic processes challenge event at 01:50-03:00 UT on 18 November 2015, when the magnetosphere was driven by a steady southward interplanetary magnetic field (IMF). In the MHD-EPIC simulation, the dayside magnetopause is covered by a PIC code so that the dayside reconnection is properly handled. We compare the magnetic fields and the plasma profiles of the magnetopause crossing with the MMS3 spacecraft observations. Most variables match the observations well in the magnetosphere, in the magnetosheath, and also during the current sheet crossing. The MHD-EPIC simulation produces flux ropes, and we demonstrate that some magnetic field and plasma features observed by the MMS3 spacecraft can be reproduced by a flux rope crossing event. We use an algorithm to automatically identify the reconnection sites from the simulation results. It turns out that there are usually multiple X-lines at the magnetopause. By tracing the locations of the X-lines, we find that the typical moving speed of the X-line endpoints is about 70 km/s, which is higher than but still comparable with the ground-based observations.

  • 24. Chien, Steven W. D.
    et al.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Sishtla, Chaitanya Prasad
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Santos, Luis
    Herman, Pawel
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Nrasimhamurthy, Sai
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Characterizing Deep-Learning I/O Workloads in TensorFlow2018In: Proceedings of PDSW-DISCS 2018: 3rd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 54-63Conference paper (Refereed)
    Abstract [en]

    The performance of Deep-Learning (DL) computing frameworks rely on the rformance of data ingestion and checkpointing. In fact, during the aining, a considerable high number of relatively small files are first aded and pre-processed on CPUs and then moved to accelerator for mputation. In addition, checkpointing and restart operations are rried out to allow DL computing frameworks to restart quickly from a eckpoint. Because of this, I/O affects the performance of DL plications. this work, we characterize the I/O performance and scaling of nsorFlow, an open-source programming framework developed by Google and ecifically designed for solving DL problems. To measure TensorFlow I/O rformance, we first design a micro-benchmark to measure TensorFlow ads, and then use a TensorFlow mini-application based on AlexNet to asure the performance cost of I/O and checkpointing in TensorFlow. To prove the checkpointing performance, we design and implement a burst ffer. find that increasing the number of threads increases TensorFlow ndwidth by a maximum of 2.3 x and 7.8 x on our benchmark environments. e use of the tensorFlow prefetcher results in a complete overlap of mputation on accelerator and input pipeline on CPU eliminating the fective cost of I/O on the overall performance. The use of a burst ffer to checkpoint to a fast small capacity storage and copy ynchronously the checkpoints to a slower large capacity storage sulted in a performance improvement of 2.6x with respect to eckpointing directly to slower storage on our benchmark environment.

  • 25.
    Chien, Steven W. D.
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Peng, I. B.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads2020In: Proceedings - IEEE International Conference on Cluster Computing, ICCC, Institute of Electrical and Electronics Engineers Inc. , 2020, p. 359-370Conference paper (Refereed)
    Abstract [en]

    Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training through GPUs. However, another heavy aspect of Machine Learning is I/O, and this can potentially be a performance bottleneck. TensorFlow, one of the most popular Deep-Learning platforms, now offers a new profiler interface and allows instrumentation of TensorFlow operations. However, the current profiler only enables analysis at the TensorFlow platform level and does not provide system-level information. In this paper, we extend TensorFlow Profiler and introduce tf-Darshan, both a profiler and tracer, that performs instrumentation through Darshan. We use the same Darshan shared instrumentation library and implement a runtime attachment without using a system preload. We can extract Darshan profiling data structures during TensorFlow execution to enable analysis through the TensorFlow profiler. We visualize the performance results through TensorBoard, the web-based TensorFlow visualization tool. At the same time, we do not alter Darshan's existing implementation. We illustrate tf-Darshan by performing two case studies on ImageNet image and Malware classification. We show that by guiding optimization using data from tf-Darshan, we increase POSIX I/O bandwidth by up to 19% by selecting data for staging on fast tier storage. We also show that Darshan has the potential of being used as a runtime library for profiling and providing information for future optimization.

  • 26.
    Chien, Steven W.D.
    et al.
    University of Edinburgh, United Kingdom.
    Sato, Kento
    RIKEN Center for Computational Science Japan.
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Honda, Michio
    University of Edinburgh, United Kingdom.
    Improving Cloud Storage Network Bandwidth Utilization of Scientific Applications2023In: Proceedings of the 7th Asia-Pacific Workshop on Networking, APNET 2023, Association for Computing Machinery (ACM) , 2023, p. 172-173Conference paper (Refereed)
    Abstract [en]

    Cloud providers began to provide managed services to attract scientific applications, which have been traditionally executed on supercomputers. One example is AWS FSx for Lustre, a fully managed parallel file system (PFS) released in 2018. However, due to the nature of scientific applications, the frontend storage network bandwidth is left completely idle for the majority of its lifetime. Furthermore, the pricing model does not match the scalability requirement. We propose iFast, a novel host-side caching mechanism for scientific applications that improves storage bandwidth utilization and end-to-end application performance: by overlapping compute and data writeback through inexpensive local storage. iFast supports the Massage Passing Interface (MPI) library that is widely used by scientific applications and is implemented as a preloaded library. It requires no change to applications, the MPI library, or support from cloud operators. We demonstrate how iFast can accelerate the end-to-end time of a representative scientific application Neko, by 13-40%.

  • 27.
    Chien, Steven Wei Der
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Olshevsky, Vyacheslav
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Bulatov, Yaroslav
    South Pk Commons, San Francisco, CA USA..
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Vetter, Jeffrey S.
    Oak Ridge Natl Lab, Oak Ridge, TN USA..
    TensorFlow Doing HPC An Evaluation of TensorFlow Performance in HPC Applications2019In: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Institute of Electrical and Electronics Engineers (IEEE) , 2019, p. 509-518Conference paper (Refereed)
    Abstract [en]

    TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML) applications, in fact TensorFlow aims at supporting the development of a much broader range of application kinds that are outside the ML domain and can possibly include HPC applications. However, very few experiments have been conducted to evaluate TensorFlow performance when running HPC workloads on supercomputers. This work addresses this lack by designing four traditional HPC benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG) solver and Fast Fourier Transform (FFT). We analyze their performance on two supercomputers with accelerators and evaluate the potential of TensorFlow for developing HPC applications. Our tests show that TensorFlow can fully take advantage of high performance networks and accelerators on supercomputers. Running our Tensor-Flow STREAM benchmark, we obtain over 50% of theoretical communication bandwidth on our testing platform. We find an approximately 2x, 1.7x and 1.8x performance improvement when increasing the number of GPUs from two to four in the matrix-matrix multiply, CG and FFT applications respectively. All our performance results demonstrate that TensorFlow has high potential of emerging also as HPC programming framework for heterogeneous supercomputers.

  • 28.
    Chien, Steven Wei Der
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Sishtla, Chaitanya Prasad
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Jun, Zhang
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Peng, Ivy Bo
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems2018In: Proceedings of the 5th International Conference on Exascale Applications and Software, The University of Edinburgh , 2018, p. 34-Conference paper (Refereed)
    Abstract [en]

    Computational intensive applications such as pattern recognition, and natural language processing, are increasingly popular on HPC systems. Many of these applications use deep-learning, a branch of machine learning, to determine the weights of artificial neural network nodes by minimizing a loss function. Such applications depend heavily on dense matrix multiplications, also called tensorial operations. The use of Graphics Processing Unit (GPU) has considerably speeded up deep-learning computations, leading to a Renaissance of the artificial neural network. Recently, the NVIDIA Volta GPU and the Google Tensor Processing Unit (TPU) have been specially designed to support deep-learning workloads. New programming models have also emerged for convenient expression of tensorial operations and deep-learning computational paradigms. An example of such new programming frameworks is TensorFlow, an open-source deep-learning library released by Google in 2015. TensorFlow expresses algorithms as a computational graph where nodes represent operations and edges between nodes represent data flow. Multi-dimensional data such as vectors and matrices which flows between operations are called Tensors. For this reason, computation problems need to be expressed as a computational graph. In particular, TensorFlow supports distributed computation with flexible assignment of operation and data to devices such as GPU and CPU on different computing nodes. Computation on devices are based on optimized kernels such as MKL, Eigen and cuBLAS. Inter-node communication can be through TCP and RDMA. This work attempts to evaluate the usability and expressiveness of the TensorFlow programming model for traditional HPC problems. As an illustration, we prototyped a distributed block matrix multiplication for large dense matrices which cannot be co-located on a single device and a Conjugate Gradient (CG) solver. We evaluate the difficulty of expressing traditional HPC algorithms using computational graphs and study the scalability of distributed TensorFlow on accelerated systems. Our preliminary result with distributed matrix multiplication shows that distributed computation on TensorFlow is extremely scalable. This study provides an initial investigation of new emerging programming models for HPC.

    Download full text (pdf)
    fulltext
  • 29.
    Chien, Wei Der
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Nylund, Jonas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Bengtsson, Gabriel
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Peng, I. B.
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    SputniPIC: An implicit particle-in-cell code for multi-GPU systems2020In: Proceedings - Symposium on Computer Architecture and High Performance Computing, IEEE Computer Society , 2020, p. 149-156Conference paper (Refereed)
    Abstract [en]

    Large-scale simulations of plasmas are essential for advancing our understanding of fusion devices, space, and astrophysical systems. Particle-in-Cell (PIC) codes have demonstrated their success in simulating numerous plasma phenomena on HPC systems. Today, flagship supercomputers feature multiple GPUs per compute node to achieve unprecedented computing power at high power efficiency. PIC codes require new algorithm design and implementation for exploiting such accelerated platforms. In this work, we design and optimize a three-dimensional implicit PIC code, called sputniPIC, to run on a general multi-GPU compute node. We introduce a particle decomposition data layout, in contrast to domain decomposition on CPU-based implementations, to use particle batches for overlapping communication and computation on GPUs. sputniPIC also natively supports different precision representations to achieve speed up on hardware that supports reduced precision. We validate sputniPIC through the well-known GEM challenge and provide performance analysis. We test sputniPIC on three multi-GPU platforms and report a 200-800x performance improvement with respect to the sputniPIC CPU OpenMP version performance. We show that reduced precision could further improve performance by 45% to 80% on the three platforms. Because of these performance improvements, on a single node with multiple GPUs, sputniPIC enables large-scale three-dimensional PIC simulations that were only possible using clusters.

  • 30.
    Chien, Wei Der
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Peng, Ivy B.
    Lawrence Livermore National LaboratoryLivermoreUSA.
    Markidis, Stefano
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Posit NPB: Assessing the precision improvement in HPC scientific applications2020In: Lecture Notes in Computer Science, Springer , 2020, p. 301-310Conference paper (Refereed)
    Abstract [en]

    Floating-point operations can significantly impact the accuracy and performance of scientific applications on large-scale parallel systems. Recently, an emerging floating-point format called Posit has attracted attention as an alternative to the standard IEEE floating-point formats because it could enable higher precision than IEEE formats using the same number of bits. In this work, we first explored the feasibility of Posit encoding in representative HPC applications by providing a 32-bit Posit NAS Parallel Benchmark (NPB) suite. Then, we evaluate the accuracy improvement in different HPC kernels compared to the IEEE 754 format. Our results indicate that using Posit encoding achieves optimized precision, ranging from 0.6 to 1.4 decimal digit, for all tested kernels and proxy-applications. Also, we quantified the overhead of the current software implementation of Posit encoding as 4×–19× that of IEEE 754 hardware implementation. Our study highlights the potential of hardware implementations of Posit to benefit a broad range of HPC applications. 

  • 31.
    Chien, Wei Der
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Peng, Ivy
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Performance evaluation of advanced features in CUDA unified memory2019In: Proceedings of MCHPC 2019: Workshop on Memory Centric High Performance Computing - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis, Institute of Electrical and Electronics Engineers Inc. , 2019, p. 50-57Conference paper (Refereed)
    Abstract [en]

    CUDA Unified Memory improves the GPU pro- grammability and also enables GPU memory oversubscription. Recently, two advanced memory features, memory advises and asynchronous prefetch, have been introduced. In this work, we evaluate the new features on two platforms that feature different CPUs, GPUs, and interconnects. We derive a benchmark suite for the experiments and stress the memory system to evaluate both in-memory and oversubscription performance. The results show that memory advises on the Intel-Volta/Pascal- PCIe platform bring negligible improvement for in-memory exe- cutions. However, when GPU memory is oversubscribed by about 50%, using memory advises results in up to 25% performance improvement compared to the basic CUDA Unified Memory. In contrast, the Power9-Volta-NVLink platform can substantially benefit from memory advises, achieving up to 34% performance gain for in-memory executions. However, when GPU memory is oversubscribed on this platform, using memory advises increases GPU page faults and results in considerable performance loss. The CUDA prefetch also shows different performance impact on the two platforms. It improves performance by up to 50% on the Intel-Volta/Pascal-PCI-E platform but brings little benefit to the Power9-Volta-NVLink platform.

  • 32.
    Chien, Wei Der
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Svedin, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Tkachuk, Andriy
    Seagate Systems UK.
    El Sayed, Salem
    Jülich Supercomputing Centre, Forschungszentrum Jülich.
    Herman, Pawel
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Umanesan, Ganesan
    Seagate Systems UK.
    Narasimhamurthy, Sai
    Seagate Systems UK.
    Markidis, Stefano
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    NoaSci: A Numerical Object Array Library for I/O of Scientific Applications on Object Storage2022In: 2022 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Institute of Electrical and Electronics Engineers (IEEE) , 2022Conference paper (Refereed)
    Abstract [en]

    The strong consistency and stateful workflow are seen as the major factors for limiting parallel I/O performance because of the need for locking and state management. While the POSIX-based I/O model dominates modern HPC storage infrastructure, emerging object storage technology can potentially improve I/O performance by eliminating these bottlenecks.Despite a wide deployment on the cloud, its adoption in HPCremains low. We argue one reason is the lack of a suitable programming interface for parallel I/O in scientific applications. In this work, we introduce NoaSci, a Numerical Object Arraylibrary for scientific applications. NoaSci supports different data formats (e.g. HDF5, binary), and focuses on supporting node-local burst buffers and object stores. We demonstrate for the first time how scientific applications can perform parallel I/Oon Seagate’s Motr object store through NoaSci. We evaluate NoaSci’s preliminary performance using the iPIC3D spaceweather application and position against existing I/O methods.

  • 33. Daldorff, Lars K. S.
    et al.
    Toth, Gabor
    Gombosi, Tamas I.
    Lapenta, Giovanni
    Amaya, Jorge
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Brackbill, Jeremiah U.
    Two-way coupling of a global Hall magnetohydrodynamics model with a local implicit particle-in-cell model2014In: Journal of Computational Physics, ISSN 0021-9991, E-ISSN 1090-2716, Vol. 268, p. 236-254Article in journal (Refereed)
    Abstract [en]

    Computational models based on a fluid description of the plasma, such as magnetohydrodynamic (MHD) and extended magnetohydrodynamic (XMHD) codes are highly efficient, but they miss the kinetic effects due to the assumptions of small gyro radius, charge neutrality, and Maxwellian thermal velocity distribution. Kinetic codes can properly take into account the kinetic effects, but they are orders of magnitude more expensive than the fluid codes due to the increased degrees of freedom. If the fluid description is acceptable in a large fraction of the computational domain, it makes sense to confine the kinetic model to the regions where kinetic effects are important. This coupled approach can be much more efficient than a pure kinetic model. The speed up is approximately the volume ratio of the full domain relative to the kinetic regions assuming that the kinetic code uses a uniform grid. This idea has been advocated by [1] but their coupling was limited to one dimension and they employed drastically different grid resolutions in the fluid and kinetic models. We describe a fully two-dimensional two-way coupling of a Hall MHD model BATS-R-US with an implicit Particle-in-Cell (PIC) model iPIC3D. The coupling can be performed with identical grid resolutions and time steps. We call this coupled computational plasma model MHD-EPIC (MHD with Embedded PIC regions). Our verification tests show that MHD-EPIC works accurately and robustly. We show a two-dimensional magnetosphere simulation as an illustration of the potential future applications of MHD-EPIC.

  • 34. Deca, J.
    et al.
    Divin, A.
    Lapenta, G.
    Lembège, B.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Horányi, M.
    Electromagnetic Particle-in-Cell Simulations of the Solar Wind Interaction with Lunar Magnetic Anomalies2014In: Physical Review Letters, ISSN 0031-9007, E-ISSN 1079-7114, Vol. 112, no 15, p. 151102-Article in journal (Refereed)
    Abstract [en]

    We present the first three-dimensional fully kinetic and electromagnetic simulations of the solar wind interaction with lunar crustal magnetic anomalies (LMAs). Using the implicit particle-in-cell code IPIC3D, we confirm that LMAs may indeed be strong enough to stand off the solar wind from directly impacting the lunar surface forming a mini-magnetosphere, as suggested by spacecraft observations and theory. In contrast to earlier magnetohydrodynamics and hybrid simulations, the fully kinetic nature of IPIC3D allows us to investigate the space charge effects and in particular the electron dynamics dominating the near-surface lunar plasma environment. We describe for the first time the interaction of a dipole model centered just below the lunar surface under plasma conditions such that only the electron population is magnetized. The fully kinetic treatment identifies electromagnetic modes that alter the magnetic field at scales determined by the electron physics. Driven by strong pressure anisotropies, the mini-magnetosphere is unstable over time, leading to only temporal shielding of the surface underneath. Future human exploration as well as lunar science in general therefore hinges on a better understanding of LMAs.

  • 35. Deca, J.
    et al.
    Lapenta, G.
    Marchand, R.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Spacecraft charging analysis with the implicit particle-in-cell code iPic3D2013In: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 20, no 10, p. 102902-Article in journal (Refereed)
    Abstract [en]

    We present the first results on the analysis of spacecraft charging with the implicit particle-in-cell code iPic3D, designed for running on massively parallel supercomputers. The numerical algorithm is presented, highlighting the implementation of the electrostatic solver and the immersed boundary algorithm; the latter which creates the possibility to handle complex spacecraft geometries. As a first step in the verification process, a comparison is made between the floating potential obtained with iPic3D and with Orbital Motion Limited theory for a spherical particle in a uniform stationary plasma. Second, the numerical model is verified for a CubeSat benchmark by comparing simulation results with those of PTetra for space environment conditions with increasing levels of complexity. In particular, we consider spacecraft charging from plasma particle collection, photoelectron and secondary electron emission. The influence of a background magnetic field on the floating potential profile near the spacecraft is also considered. Although the numerical approaches in iPic3D and PTetra are rather different, good agreement is found between the two models, raising the level of confidence in both codes to predict and evaluate the complex plasma environment around spacecraft.

  • 36. Deca, Jan
    et al.
    Divin, Andrey
    Henri, Pierre
    Eriksson, Anders
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Olshevsky, Vyacheslav
    Horanyi, Mihaly
    Electron and Ion Dynamics of the Solar Wind Interaction with a Weakly Outgassing Comet2017In: Physical Review Letters, ISSN 0031-9007, E-ISSN 1079-7114, Vol. 118, no 20, article id 205101Article in journal (Refereed)
    Abstract [en]

    Using a 3D fully kinetic approach, we disentangle and explain the ion and electron dynamics of the solar wind interaction with a weakly outgassing comet. We show that, to first order, the dynamical interaction is representative of a four-fluid coupled system. We self-consistently simulate and identify the origin of the warm and suprathermal electron distributions observed by ESA's Rosetta mission to comet 67P/Churyumov-Gerasimenko and conclude that a detailed kinetic treatment of the electron dynamics is critical to fully capture the complex physics of mass-loading plasmas.

  • 37. Deca, Jan
    et al.
    Divin, Andrey
    Lembege, Bertrand
    Horanyi, Mihaly
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Lapenta, Giovanni
    General mechanism and dynamics of the solar wind interaction with lunar magnetic anomalies from 3-D particle-in-cell simulations2015In: Journal of Geophysical Research - Space Physics, ISSN 2169-9380, E-ISSN 2169-9402, Vol. 120, no 8, p. 6443-6463Article in journal (Refereed)
    Abstract [en]

    We present a general model of the solar wind interaction with a dipolar lunar crustal magnetic anomaly (LMA) using three-dimensional full-kinetic and electromagnetic simulations. We confirm that LMAs may indeed be strong enough to stand off the solar wind from directly impacting the lunar surface, forming a so-called minimagnetosphere, as suggested by spacecraft observations and theory. We show that the LMA configuration is driven by electron motion because its scale size is small with respect to the gyroradius of the solar wind ions. We identify a population of back-streaming ions, the deflection of magnetized electrons via the E x B drift motion, and the subsequent formation of a halo region of elevated density around the dipole source. Finally, it is shown that the presence and efficiency of the processes are heavily impacted by the upstream plasma conditions and, on their turn, influence the overall structure and evolution of the LMA system. Understanding the detailed physics of the solar wind interaction with LMAs, including magnetic shielding, particle dynamics and surface charging is vital to evaluate its implications for lunar exploration.

  • 38. Deca, Jan
    et al.
    Divin, Andrey
    Wang, Xu
    Lembege, Bertrand
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Horanyi, Mihaly
    Lapenta, Giovanni
    Three-dimensional full-kinetic simulation of the solar wind interaction with a vertical dipolar lunarmagnetic anomaly2016In: Geophysical Research Letters, ISSN 0094-8276, E-ISSN 1944-8007, Vol. 43, no 9, p. 4136-4144Article in journal (Refereed)
    Abstract [en]

    A detailed understanding of the solar wind interaction with lunar magnetic anomalies (LMAs) is essential to identify its implications for lunar exploration and to enhance our physical understanding of the particle dynamics in a magnetized plasma. We present the first three-dimensional full-kinetic electromagnetic simulation case study of the solar wind interaction with a vertical dipole, resembling a medium-size LMA. In contrast to a horizontal dipole, we show that a vertical dipole twists its field lines and cannot form a minimagnetosphere. Instead, it creates a ring-shaped weathering pattern and reflects up to 21% (four times more as compared to the horizontal case) of the incoming solar wind ions electrostatically through the normal electric field formed above the electron shielding region surrounding the cusp. This work delivers a vital piece to fully comprehend and interpret lunar observations, as we find the amount of reflected ions to be a tracer for the underlying field structure.

  • 39. Divin, A.
    et al.
    Khotyaintsev, Y. V.
    Vaivads, Andris
    André, M.
    Toledo-Redondo, S.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Lapenta, G.
    Three-scale structure of diffusion region in the presence of cold ions2016In: Journal of Geophysical Research - Space Physics, ISSN 2169-9380, E-ISSN 2169-9402, Vol. 121, no 12, p. 12,001-12,013Article in journal (Refereed)
    Abstract [en]

    Kinetic simulations and spacecraft observations typically display the two-scale structure of collisionless diffusion region (DR), with electron and ion demagnetization scales governing the spatial extent of the DR. Recent in situ observations of the nightside magnetosphere, as well as investigation of magnetic reconnection events at the Earth's magnetopause, discovered the presence of a population of cold (tens of eV) ions of ionospheric origin. We present two-dimensional particle-in-cell simulations of collisionless magnetic reconnection in multicomponent plasma with ions consisting of hot and cold populations. We show that a new cold ion diffusion region scale is introduced in between that of hot ions and electrons. Demagnetization scale of cold ion population is several times (∼4–8) larger than the initial cold ion gyroradius. Cold ions are accelerated and thermalized during magnetic reconnection and form ion beams moving with velocities close to the Alfvén velocity.

  • 40. Divin, A.
    et al.
    Khotyaintsev, Yu. V.
    Vaivads, A.
    Andre, M.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Lapenta, G.
    Evolution of the lower hybrid drift instability at reconnection jet front2015In: Journal of Geophysical Research - Space Physics, ISSN 2169-9380, E-ISSN 2169-9402, Vol. 120, no 4, p. 2675-2690Article in journal (Refereed)
    Abstract [en]

    We investigate current-driven modes developing at jet fronts during collisionless reconnection. Initial evolution of the reconnection is simulated using conventional 2-D setup starting from the Harris equilibrium. Three-dimensional PIC calculations are implemented at later stages, when fronts are fully formed. Intense currents and enhanced wave activity are generated at the fronts because of the interaction of the fast flow plasma and denser ambient current sheet plasma. The study reveals that the lower hybrid drift instability develops quickly in the 3-D simulation. The instability produces strong localized perpendicular electric fields, which are several times larger than the convective electric field at the front, in agreement with Time History of Events and Macroscale Interactions during Substorms observations. The instability generates waves, which escape the front edge and propagate into the undisturbed plasma ahead of the front. The parallel electron pressure is substantially larger in the 3-D simulation compared to that of the 2-D. In a time similar to Omega(-1)(ci), the instability forms a layer, which contains a mixture of the jet plasma and current sheet plasma. The results confirm that the lower hybrid drift instability is important for the front evolution and electron energization.

  • 41. Divin, A.
    et al.
    Lapenta, G.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Newman, D. L.
    Goldman, M. V.
    Numerical simulations of separatrix instabilities in collisionless magnetic reconnection2012In: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 19, no 4, p. 042110-Article in journal (Refereed)
    Abstract [en]

    Electron scale dynamics of magnetic reconnection separatrix jets is studied in this paper. Instabilities developing in directions both parallel and perpendicular to the magnetic field are investigated. Implicit particle-in-cell simulations with realistic electron-to-ion mass ratio are complemented by a set of small scale high resolution runs having the separatrix force balance as the initial condition. A special numerical procedure is developed to introduce the force balance into the small scale runs. Simulations show the development of streaming instabilities and consequent formation of electron holes in the parallel direction. A new electron jet instability develops in the perpendicular direction. The instability is closely related to the electron MHD Kelvin-Helmholtz mode and is destabilized by a flow, perpendicular to magnetic field at the separatrix. Tearing instability of the separatrix electron jet is modulated strongly by the electron MHD Kelvin-Helmholtz mode.

  • 42. Divin, A.
    et al.
    Lapenta, Giovanni
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Semenov, V. S.
    Erkaev, N. V.
    Korovinskiy, D. B.
    Biernat, H. K.
    Scaling of the inner electron diffusion region in collisionless magnetic reconnection2012In: Journal of Geophysical Research, ISSN 0148-0227, E-ISSN 2156-2202, Vol. 117, p. A06217-Article in journal (Refereed)
    Abstract [en]

    The Sweet-Parker analysis of the inner electron diffusion region of collisionless magnetic reconnection is presented. The study includes charged particles motion near the X-line and an appropriate approximation of the off-diagonal term for the electron pressure tensor. The obtained scaling shows that the width of the inner electron diffusion region is equal to the electron inertial length, and that electrons are accelerated up to the electron Alfven velocity in X-line direction. The estimated effective plasma conductivity is based on the electron gyrofrequency rather than the binary collision frequency, and gives the extreme (minimal) value of the plasma conductivity similar to Bohm diffusion. The scaling properties are verified by means of Particle-in-Cell simulations. An ad hoc parameter needs to be introduced to the scaling relations in order to better match the theory and simulations.

  • 43. Divin, A.
    et al.
    Semenov, V.
    Korovinskiy, D.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Deca, J.
    Olshevsky, V.
    Lapenta, G.
    A new model for the electron pressure nongyrotropy in the outer electron diffusion region2016In: Geophysical Research Letters, ISSN 0094-8276, E-ISSN 1944-8007, Vol. 43, no 20, p. 10565-10573Article in journal (Refereed)
    Abstract [en]

    We present a new model to describe the electron pressure nongyrotropy inside the electron diffusion region (EDR) in an antiparallel magnetic reconnection scenario. A combination of particle-in-cell simulations and analytical estimates is used to identify such a component of the electron pressure tensor in the rotated coordinates, which is nearly invariant along the outflow direction between the X line and the electron remagnetization points in the outer EDR. It is shown that the EDR two-scale structure (inner and outer parts) is formed due to superposition of the nongyrotropic meandering electron population and gyrotropic electron population with large anisotropy parallel to the magnetic field upstream of the EDR. Inside the inner EDR the influence of the pressure anisotropy can largely be ignored. In the outer EDR, a thin electron layer with electron flow speed exceeding the E x B drift velocity is supported by large-momentum flux produced by the electron pressure anisotropy upstream of the EDR. We find that this fast electron exhaust flow with |V(e)xB|>|E| is in fact a constituent part of the EDR, a finding which will steer the interpretation of the Magnetospheric Multiscale Mission (MMS) data.

  • 44.
    Divin, A.
    et al.
    St Petersburg State Univ, Dept Earths Phys, St Petersburg 198504, Russia..
    Semenov, V.
    St Petersburg State Univ, Dept Earths Phys, St Petersburg 198504, Russia..
    Zaitsev, I.
    St Petersburg State Univ, Dept Earths Phys, St Petersburg 198504, Russia..
    Korovinskiy, D.
    Austrian Acad Sci, Space Res Inst, A-8042 Graz, Austria..
    Deca, J.
    Univ Colorado Boulder, LASP, Boulder, CO 80303 USA..
    Lapenta, G.
    Katholieke Univ Leuven, Dept Math, B-3001 Leuven, Belgium..
    Olshevsky, Viacheslav
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Inner and outer electron diffusion region of antiparallel collisionless reconnection: Density dependence2019In: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 26, no 10, article id 102305Article in journal (Refereed)
    Abstract [en]

    We study inflow density dependence of substructures within electron diffusion region (EDR) of collisionless symmetric magnetic reconnection. We perform a set of 2.5D particle-in-cell simulations which start from a Harris current layer with a uniform background density n(b). A scan of n(b) ranging from 0:02 n(0) to 2 n(0) of the peak current layer density (n(0)) is studied keeping other plasma parameters the same. Various quantities measuring reconnection rate, EDR spatial scales, and characteristic velocities are introduced. We analyze EDR properties during quasisteady stage when the EDR length measures saturate. Consistent with past kinetic simulations, electrons are heated parallel to the B field in the inflow region. The presence of the strong parallel anisotropy acts twofold: (1) electron pressure anisotropy drift gets important at the EDR upstream edge in addition to the E x B drift speed and (2) the pressure anisotropy term -del.P-(e)/(ne) modifies the force balance there. We find that the width of the EDR demagnetization region and EDR current are proportional to the electron inertial length similar to d(e) and similar to d(e)n(b)(0.22), respectively. Magnetic reconnection is fast with a rate of similar to 0.1 but depends weakly on density as similar to n(b)(-1/8). Such reconnection rate proxies as EDR geometrical aspect or the inflow-to-outflow electron velocity ratio are shown to have different density trends, making electric field the only reliable measure of the reconnection rate. Published under license by AIP Publishing.

  • 45.
    Divin, Andrey
    et al.
    St Petersburg State Univ, Ulianovskaya 1, St Petersburg 198504, Russia..
    Deca, Jan
    Univ Colorado, LASP, Boulder, CO 80303 USA.;NASA, Inst Modeling Plasma Atmospheres & Cosm Dust, SSERVI, Moffett Field, CA 94035 USA..
    Eriksson, Anders
    Swedish Inst Space Phys, Box 537, SE-75121 Uppsala, Sweden..
    Henri, Pierre
    CNRS, LPC2E, 3 Ave Rech Sci, F-45071 Orleans, France.;UCA, CNRS, OCA, Lab Lagrange, Nice, France..
    Lapenta, Giovanni
    Katholieke Univ Leuven, CmPA, Dept Math, Celestijnenlaan 200B,Bus 2400, B-3001 Leuven, Belgium..
    Olshevsky, Viacheslav
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    A Fully Kinetic Perspective of Electron Acceleration around a Weakly Outgassing Comet2020In: Astrophysical Journal Letters, ISSN 2041-8205, E-ISSN 2041-8213, Vol. 889, no 2, article id L33Article in journal (Refereed)
    Abstract [en]

    The cometary mission Rosetta has shown the presence of higher-than-expected suprathermal electron fluxes. In this study, using 3D fully kinetic electromagnetic simulations of the interaction of the solar wind with a comet, we constrain the kinetic mechanism that is responsible for the bulk electron energization that creates the suprathermal distribution from the warm background of solar wind electrons. We identify and characterize the magnetic field-aligned ambipolar electric field that ensures quasi-neutrality and traps warm electrons. Solar wind electrons are accelerated to energies as high as 50-70 eV close to the comet nucleus without the need for wave-particle or turbulent heating mechanisms. We find that the accelerating potential controls the parallel electron temperature, total density, and (to a lesser degree) the perpendicular electron temperature and the magnetic field magnitude. Our self-consistent approach enables us to better understand the underlying plasma processes that govern the near-comet plasma environment.

  • 46.
    Dykes, Tim
    et al.
    HPE HPC/AI EMEA Research Lab.
    Foyer, Clément
    HPE HPC/AI EMEA Research Lab, HPC Research Group, Univ. of Bristol.
    Richardson, Harvey
    HPE HPC/AI EMEA Research Lab.
    Svedin, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Tate, Adrian
    Numerical Algorithms Group Ltd. (NAG).
    McIntosh-Smith, Simon
    HPC Research Group, Univ. of Bristol.
    Mamba: Portable Array-based Abstractions for Heterogeneous High-Performance Systems2021In: 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Institute of Electrical and Electronics Engineers (IEEE) , 2021Conference paper (Refereed)
    Abstract [en]

    High performance computing architectures have become increasingly heterogeneous in recent times. This growing architectural variety presents a multi-faceted portability problem affecting applications, libraries, programming models, languages, compilers, run-times, and system software. Approaches for performance portability typically focus heavily on efficient usage of parallel compute architectures and less on data locality abstractions and complex memory systems, with minimal support afforded to effective memory management in traditional HPC languages such as C and Fortran. We present Mamba, a library to facilitate usage of heterogeneous memory systems by high performance application/library developers through high level array-based abstractions for memory management supported by a low-level generic memory API. We detail the library design and implementation, demonstrating generic memory allocation, data layout specification, array tiling and heterogeneous transport. We evaluate performance in the context of a typical matrix transposition, DNA sequencing benchmark, and an application use case for high-order spectral element based incompressible flow.

  • 47. Eriksson, S.
    et al.
    Lapenta, G.
    Newman, D. L.
    Phan, T. D.
    Gosling, J. T.
    Lavraud, B.
    Khotyaintsev, Y. V.
    Carr, C. M.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Goldman, M. V.
    On Multiple Reconnection X-Lines and Tripolar Perturbations of Strong Guide Magnetic Fields2015In: Astrophysical Journal, ISSN 0004-637X, E-ISSN 1538-4357, Vol. 805, no 1, article id 43Article in journal (Refereed)
    Abstract [en]

    We report new multi-spacecraft Cluster observations of tripolar guide magnetic field perturbations at a solar wind reconnection exhaust in the presence of a guide field B-M. which is almost four times as strong as the reversing field B-L. The novel tripolar field consists of two narrow regions of depressed B-M, with an observed 7%-14% Delta B-M magnitude relative to the external field, which are found adjacent to a wide region of enhanced BM within the exhaust. A stronger reversing field is associated with each B-M depression. A kinetic reconnection simulation for realistic solar wind conditions and the observed strong guide field reveals that tripolar magnetic fields preferentially form across current sheets in the presence of multiple X-lines as magnetic islands approach one another and merge into fewer and larger islands. The simulated Delta B-M/Delta X-N over the normal width Delta X-N between a B-M minimum and the edge of the external region agree with the normalized values observed by Cluster. We propose that a tripolar guide field perturbation may be used to identify candidate regions containing multiple X-lines and interacting magnetic islands at individual solar wind current sheets with a strong guide field.

  • 48.
    Faj, Jennifer
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Williams, Jeremy J.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Peng, Ivy Bo
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Ganse, Urs
    University of Helsinki, Helsinki, Finland.
    Battarbee, Markus
    University of Helsinki, Helsinki, Finland.
    Pfau-Kempf, Yann
    University of Helsinki, Helsinki, Finland.
    Kotipalo, Leo
    University of Helsinki, Helsinki, Finland.
    Palmroth, Minna
    University of Helsinki, Helsinki, Finland.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    MPI Performance Analysis in Vlasiator: Unraveling Communication Bottlenecks2023In: SC23 Proccedings: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Denver, Colorado, USA, 2023Conference paper (Refereed)
    Abstract [en]

    Vlasiator is a popular and powerful massively parallel code for accurate magnetospheric and solar wind plasma simulations. This work provides an in-depth analysis of Vlasiator, focusing on MPI performance using the Integrated Performance Monitoring (IPM) tool. We show that MPI non-blocking point-to-point communication accounts for most of the communication time. The communication topology shows a large number of MPI messages exchanging data in a six-dimensional grid. We also show that relatively large messages are used in MPI communication, reaching up to 256MB. As a communication-bound application, we found that using OpenMP in Vlasiator is critical for eliminating intra-node communication. Our results provide important insights for optimizing Vlasiator for the upcoming Exascale machines.

  • 49.
    Flatken, Markus
    et al.
    Institute for Software Technology (SC), Software for Space Systems and Interactive Visualization, German Aerospace Center (DLR), Braunschweig, Germany.
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Chien, Wei Der
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Gerndt, Andreas
    Institute for Software Technology (SC), Software for Space Systems and Interactive Visualization, German Aerospace Center (DLR), Braunschweig, Germany.
    et al.,
    VESTEC: Visual Exploration and Sampling Toolkit for Extreme Computing2023In: IEEE Access, E-ISSN 2169-3536, Vol. 11, p. 87805-87834Article in journal (Refereed)
    Abstract [en]

    Natural disasters and epidemics are unfortunate recurring events that lead to huge societal and economic loss. Recent advances in supercomputing can facilitate simulations of such scenarios in (or even ahead of) real-time, therefore supporting the design of adequate responses by public authorities. By incorporating high-velocity data from sensors and modern high-performance computing systems, ensembles of simulations and advanced analysis enable urgent decision-makers to better monitor the disaster and to employ necessary actions (e.g., to evacuate populated areas) for mitigating these events. Unfortunately, frameworks to support such versatile and complex workflows for urgent decision-making are only rarely available and often lack in functionalities. This paper gives an overview of the VESTEC project and framework, which unifies orchestration, simulation, in-situ data analysis, and visualization of natural disasters that can be driven by external sensor data or interactive intervention by the user. We show how different components interact and work together in VESTEC and describe implementation details. To disseminate our experience three different types of disasters are evaluated: a Wildfire in La Jonquera (Spain), a Mosquito-Borne disease in two regions of Italy, and the magnetic reconnection in the Earth magnetosphere.

  • 50. Goldman, M. V.
    et al.
    Newman, D. L.
    Lapenta, G.
    Andersson, L.
    Gosling, J. T.
    Eriksson, S.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Eastwood, J. P.
    Ergun, R.
    Cerenkov Emission of Quasiparallel Whistlers by Fast Electron Phase-Space Holes during Magnetic Reconnection2014In: Physical Review Letters, ISSN 0031-9007, E-ISSN 1079-7114, Vol. 112, no 14, p. 145002-Article in journal (Refereed)
    Abstract [en]

    Kinetic simulations of magnetotail reconnection have revealed electromagnetic whistlers originating near the exhaust boundary and propagating into the inflow region. The whistler production mechanism is not a linear instability, but rather is Cerenkov emission of almost parallel whistlers from localized moving clumps of charge (finite-size quasiparticles) associated with nonlinear coherent electron phase space holes. Whistlers are strongly excited by holes without ever growing exponentially. In the simulation the whistlers are emitted in the source region from holes that accelerate down the magnetic separatrix towards the x line. The phase velocity of the whistlers upsilon(phi) in the source region is everywhere well matched to the hole velocity upsilon(H) as required by the Cerenkov condition. The simulation shows emission is most efficient near the theoretical maximum upsilon(phi) = half the electron Alfven speed, consistent with the new theoretical prediction that faster holes radiate more efficiently. While transferring energy to whistlers the holes lose coherence and dissipate over a few local ion inertial lengths. The whistlers, however, propagate to the x line and out over many 10's of ion inertial lengths into the inflow region of reconnection. As the whistlers pass near the x line they modulate the rate at which magnetic field lines reconnect.

1234 1 - 50 of 178
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf