Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 93) Show all publications
Peng, I. B., Vetter, J. S., Moore, S., Joydeep, R. & Markidis, S. (2019). Analyzing the Suitability of Contemporary 3D-Stacked PIM Architectures for HPC Scientific Applications. In: CF '19 - PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS: . Paper presented at 16th ACM International Conference on Computing Frontiers, CF 2019; Alghero, Sardinia; Italy; 30 April 2019 through 2 May 2019 (pp. 256-262). ASSOC COMPUTING MACHINERY
Open this publication in new window or tab >>Analyzing the Suitability of Contemporary 3D-Stacked PIM Architectures for HPC Scientific Applications
Show others...
2019 (English)In: CF '19 - PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, ASSOC COMPUTING MACHINERY , 2019, p. 256-262Conference paper, Published paper (Refereed)
Abstract [en]

Scaling off-chip bandwidth is challenging due to fundamental limitations, such as a fixed pin count and plateauing signaling rates. Recently, vendors have turned to 2.5D and 3D stacking to closely integrate system components. Interestingly, these technologies can integrate a logic layer under multiple memory dies, enabling computing capability inside a memory stack. This trend in stacking is making PIM architectures commercially viable. In this work, we investigate the suitability of offloading kernels in scientific applications onto 3D stacked PIM architectures. We evaluate several hardware constraints resulted from the stacked structure. We perform extensive simulation experiments and indepth analysis to quantify the impact of application locality in TI,Bs, data caches, and memory stacks. Our results also identify design optimization areas in software and hardware for HPC scientific applications.

Place, publisher, year, edition, pages
ASSOC COMPUTING MACHINERY, 2019
Keywords
processing-in-memory, 3D stacked mernory, PIM, NNARD RH, 1974, IEEE JOURNAL OF SOLID-STATE CIRCUITS, VSC 9, P256
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-255514 (URN)10.1145/3310273.3322831 (DOI)000474686400036 ()2-s2.0-85066055698& (Scopus ID)
Conference
16th ACM International Conference on Computing Frontiers, CF 2019; Alghero, Sardinia; Italy; 30 April 2019 through 2 May 2019
Note

QC 20191016

Available from: 2019-10-16 Created: 2019-10-16 Last updated: 2019-10-16Bibliographically approved
Rivas Gomez, S., Markidis, S., Laure, E., Brabazon, K., Perks, O. & Narasimhamurthy, S. (2019). Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks. In: Proceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018: . Paper presented at 20th International Conference on High Performance Computing and Communications, 16th IEEE International Conference on Smart City and 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, 28 June 2018 through 30 June 2018 (pp. 921-927). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks
Show others...
2019 (English)In: Proceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 921-927Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we consider the integration of MPI one-sided communication and non-blocking I/O in HPC-centric MapReduce frameworks. Using a decoupled strategy, we aim to overlap the Map and Reduce phases of the algorithm by allowing processes to communicate and synchronize using solely one-sided operations. Hence, we effectively increase the performance in situations where the workload per process becomes unexpectedly unbalanced. Using a Word-Count implementation and a large dataset from the Purdue MapReduce Benchmarks Suite (PUMA), we demonstrate that our approach can provide up to 23% performance improvement on average compared to a reference MapReduce implementation that uses state-of-the-art MPI collective communication and I/O.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Keywords
High Performance Computing, MapReduce, MPI One Sided Communication
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-246358 (URN)10.1109/HPCC/SmartCity/DSS.2018.00153 (DOI)000468511200121 ()2-s2.0-85062487109 (Scopus ID)9781538666142 (ISBN)
Conference
20th International Conference on High Performance Computing and Communications, 16th IEEE International Conference on Smart City and 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, 28 June 2018 through 30 June 2018
Note

QC 20190319

Available from: 2019-03-19 Created: 2019-03-19 Last updated: 2019-11-01Bibliographically approved
Sishtla, C. P., Divin, A., Deca, J., Olshevsky, V. & Markidis, S. (2019). Electron trapping in the coma of a weakly outgassing comet. Physics of Plasmas, 26(10), Article ID 102904.
Open this publication in new window or tab >>Electron trapping in the coma of a weakly outgassing comet
Show others...
2019 (English)In: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 26, no 10, article id 102904Article in journal (Refereed) Published
Abstract [en]

Measurements from the Rosetta mission have shown a multitude of nonthermal electron distributions in the cometary environment, challenging the previously assumed plasma interaction mechanisms near a cometary nucleus. In this paper, we discuss electron trapping near a weakly outgassing comet from a fully kinetic (particle-in-cell) perspective. Using the electromagnetic fields derived from the simulation, we characterize the trajectories of trapped electrons in the potential well surrounding the cometary nucleus and identify the distinguishing features in their respective velocity and pitch angle distributions. Our analysis allows us to define a clear boundary in velocity phase space between the distributions of trapped and passing electrons. Published under license by AIP Publishing.

Place, publisher, year, edition, pages
AMER INST PHYSICS, 2019
National Category
Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-266916 (URN)10.1063/1.5115456 (DOI)000505980600044 ()2-s2.0-85074294951 (Scopus ID)
Note

QC 20200322

Available from: 2020-03-22 Created: 2020-03-22 Last updated: 2020-03-22Bibliographically approved
Zhou, H., Toth, G., Jia, X., Chen, Y. & Markidis, S. (2019). Embedded Kinetic Simulation of Ganymede's Magnetosphere: Improvements and Inferences. Journal of Geophysical Research - Space Physics, 124(7), 5441-5460
Open this publication in new window or tab >>Embedded Kinetic Simulation of Ganymede's Magnetosphere: Improvements and Inferences
Show others...
2019 (English)In: Journal of Geophysical Research - Space Physics, ISSN 2169-9380, E-ISSN 2169-9402, Vol. 124, no 7, p. 5441-5460Article in journal (Refereed) Published
Abstract [en]

The largest moon in the solar system, Ganymede, is also the only moon known to possess a strong intrinsic magnetic field and a corresponding magnetosphere. Using the new version of Hall magnetohydrodynamic with embedded particle-in-cell model with a self-consistently coupled resistive body representing the electrical properties of the moon's interior, improved inner boundary conditions, and the flexibility of coupling different grid geometries, we achieve better match of magnetic field with measurements for all six Galileo flybys. The G2 flyby comparisons of plasma bulk flow velocities with the Galileo Plasma Subsystem data support the oxygen ion assumption inside Ganymede's magnetosphere. Crescent shape, nongyrotropic, and nonisotropic ion distributions are identified from the coupled model. Furthermore, we have derived the energy fluxes associated with the upstream magnetopause reconnection of similar to 10(-7) W/cm(2) based on our model results and found a maximum of 40% contribution to the total peak auroral emissions.

Place, publisher, year, edition, pages
AMER GEOPHYSICAL UNION, 2019
Keywords
Ganymede, simulation, magnetosphere, reconnection, COSTER RJ, 1979, JOURNAL OF GEOPHYSICAL RESEARCH-SPACE PHYSICS, V84, P5099 syliunas VM, 2000, GEOPHYSICAL RESEARCH LETTERS, V27, P1347 a Xianzhe, 2008, JOURNAL OF GEOPHYSICAL RESEARCH-SPACE PHYSICS, V113
National Category
Geophysics
Identifiers
urn:nbn:se:kth:diva-259461 (URN)10.1029/2019JA026643 (DOI)000482985600033 ()2-s2.0-85069678381 (Scopus ID)
Note

QC 20190920

Available from: 2019-09-20 Created: 2019-09-20 Last updated: 2019-09-20Bibliographically approved
Divin, A., Semenov, V., Zaitsev, I., Korovinskiy, D., Deca, J., Lapenta, G., . . . Markidis, S. (2019). Inner and outer electron diffusion region of antiparallel collisionless reconnection: Density dependence. Physics of Plasmas, 26(10), Article ID 102305.
Open this publication in new window or tab >>Inner and outer electron diffusion region of antiparallel collisionless reconnection: Density dependence
Show others...
2019 (English)In: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 26, no 10, article id 102305Article in journal (Refereed) Published
Abstract [en]

We study inflow density dependence of substructures within electron diffusion region (EDR) of collisionless symmetric magnetic reconnection. We perform a set of 2.5D particle-in-cell simulations which start from a Harris current layer with a uniform background density n(b). A scan of n(b) ranging from 0:02 n(0) to 2 n(0) of the peak current layer density (n(0)) is studied keeping other plasma parameters the same. Various quantities measuring reconnection rate, EDR spatial scales, and characteristic velocities are introduced. We analyze EDR properties during quasisteady stage when the EDR length measures saturate. Consistent with past kinetic simulations, electrons are heated parallel to the B field in the inflow region. The presence of the strong parallel anisotropy acts twofold: (1) electron pressure anisotropy drift gets important at the EDR upstream edge in addition to the E x B drift speed and (2) the pressure anisotropy term -del.P-(e)/(ne) modifies the force balance there. We find that the width of the EDR demagnetization region and EDR current are proportional to the electron inertial length similar to d(e) and similar to d(e)n(b)(0.22), respectively. Magnetic reconnection is fast with a rate of similar to 0.1 but depends weakly on density as similar to n(b)(-1/8). Such reconnection rate proxies as EDR geometrical aspect or the inflow-to-outflow electron velocity ratio are shown to have different density trends, making electric field the only reliable measure of the reconnection rate. Published under license by AIP Publishing.

Place, publisher, year, edition, pages
AMER INST PHYSICS, 2019
National Category
Fusion, Plasma and Space Physics
Identifiers
urn:nbn:se:kth:diva-266922 (URN)10.1063/1.5109368 (DOI)000505980600024 ()2-s2.0-85073601321 (Scopus ID)
Note

QC 20200214

Available from: 2020-02-14 Created: 2020-02-14 Last updated: 2020-03-10Bibliographically approved
Simmendinger, C., Iakymchuk, R., Cebamanos, L., Akhmetova, D., Bartsch, V., Rotaru, T., . . . Markidis, S. (2019). Interoperability strategies for GASPI and MPI in large-scale scientific applications. The international journal of high performance computing applications, 33(3), 554-568
Open this publication in new window or tab >>Interoperability strategies for GASPI and MPI in large-scale scientific applications
Show others...
2019 (English)In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 33, no 3, p. 554-568Article in journal (Refereed) Published
Abstract [en]

One of the main hurdles of partitioned global address space (PGAS) approaches is the dominance of message passing interface (MPI), which as a de facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like global address space programming interface (GASPI) without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we consider an interoperable GASPI/MPI implementation for the communication/performance crucial parts of the Ludwig and iPIC3D applications. To address the discovered performance limitations, we develop a novel strategy for significantly improved performance and interoperability between both APIs by leveraging GASPI shared windows and shared notifications. First results with a corresponding implementation in the MiniGhost proxy application and the Allreduce collective operation demonstrate the viability of this approach.

Place, publisher, year, edition, pages
SAGE PUBLICATIONS LTD, 2019
Keywords
Interoperability, GASPI, MPI, iPIC3D, Ludwig, MiniGhost, halo exchange, Allreduce
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-254034 (URN)10.1177/1094342018808359 (DOI)000468919900011 ()2-s2.0-85059353725 (Scopus ID)
Note

QC 20190814

Available from: 2019-08-14 Created: 2019-08-14 Last updated: 2019-08-14Bibliographically approved
Wallden, M., Markidis, S., Okita, M. & Ino, F. (2019). Memory Efficient Load Balancing for Distributed Large-Scale Volume Rendering Using a Two-Layered Group Structure. IEICE transactions on information and systems, E102D(12), 2306-2316
Open this publication in new window or tab >>Memory Efficient Load Balancing for Distributed Large-Scale Volume Rendering Using a Two-Layered Group Structure
2019 (English)In: IEICE transactions on information and systems, ISSN 0916-8532, E-ISSN 1745-1361, Vol. E102D, no 12, p. 2306-2316Article in journal (Refereed) Published
Abstract [en]

We propose a novel compositing pipeline and a dynamic load balancing technique for volume rendering which utilizes a two-layered group structure to achieve effective and scalable load balancing. The technique enables each process to render data from non-contiguous regions of the volume with minimal impact on the total render time. We demonstrate the effectiveness of the proposed technique by performing a set of experiments on a modern GPU cluster. The experiments show that using the technique results in up to a 35.7% lower worst-case memory usage as compared to a dynamic k-d tree load balancing technique, whilst simultaneously achieving similar or higher render performance. The proposed technique was also able to lower the amount of transferred data during the load balancing stage by up to 72.2%. The technique has the potential to be used in many scenarios where other dynamic load balancing techniques have proved to be inadequate, such as during large-scale visualization.

Keywords
large-scale visualization, distributed computing, load balancing, GPU
National Category
Computer and Information Sciences Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-265514 (URN)10.1587/transinf.2019PAP0003 (DOI)000499697000004 ()2-s2.0-85076443726 (Scopus ID)
Note

QC 20191213

Available from: 2019-12-13 Created: 2019-12-13 Last updated: 2020-01-08Bibliographically approved
Sishtla, C. P., Olshevsky, V., Chien, W. D., Markidis, S. & Laure, E. (2019). Particle-in-Cell Simulations of Plasma Dynamics in Cometary Environment. In: Journal of Physics: Conference Series. Paper presented at 13th International Conference on Numerical Modeling of Space Plasma Flows, ASTRONUM 2018; Panama City Beach; United States; 25 June 2018 through 29 June 2018. Institute of Physics Publishing (IOPP), 1225(1), Article ID 012009.
Open this publication in new window or tab >>Particle-in-Cell Simulations of Plasma Dynamics in Cometary Environment
Show others...
2019 (English)In: Journal of Physics: Conference Series, Institute of Physics Publishing (IOPP), 2019, Vol. 1225, no 1, article id 012009Conference paper, Published paper (Refereed)
Abstract [en]

We perform and analyze global Particle-in-Cell (PIC) simulations of the interaction between solar wind and an outgassing comet with the goal of studying the plasma kinetic dynamics of a cometary environment. To achieve this, we design and implement a new numerical method in the iPIC3D code to model outgassing from the comet: new plasma particles are ejected from the comet "surface" at each computational cycle. Our simulations show that a bow shock is formed as a result of the interaction between solar wind and outgassed particles. The analysis of distribution functions for the PIC simulations shows that at the bow shock part of the incoming solar wind, ions are reflected while electrons are heated. This work attempts to reveal kinetic effects in the atmosphere of an outgassing comet using a fully kinetic Particle-in-Cell model.

Place, publisher, year, edition, pages
Institute of Physics Publishing (IOPP), 2019
Series
Journal of Physics: Conference Series, ISSN 17426588 ; 1225
National Category
Physical Sciences
Identifiers
urn:nbn:se:kth:diva-262635 (URN)10.1088/1742-6596/1225/1/012009 (DOI)000478669600009 ()2-s2.0-85068062214 (Scopus ID)
Conference
13th International Conference on Numerical Modeling of Space Plasma Flows, ASTRONUM 2018; Panama City Beach; United States; 25 June 2018 through 29 June 2018
Note

QC 20191018

Available from: 2019-10-18 Created: 2019-10-18 Last updated: 2019-11-07Bibliographically approved
Chien, W. D., Peng, I. & Markidis, S. (2019). Performance evaluation of advanced features in CUDA unified memory. In: Proceedings of MCHPC 2019: Workshop on Memory Centric High Performance Computing - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis. Paper presented at 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing, MCHPC@SC 2019, Denver, CO, USA, November 18, 2019 (pp. 50-57). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>Performance evaluation of advanced features in CUDA unified memory
2019 (English)In: Proceedings of MCHPC 2019: Workshop on Memory Centric High Performance Computing - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis, Institute of Electrical and Electronics Engineers Inc. , 2019, p. 50-57Conference paper, Published paper (Refereed)
Abstract [en]

CUDA Unified Memory improves the GPU pro- grammability and also enables GPU memory oversubscription. Recently, two advanced memory features, memory advises and asynchronous prefetch, have been introduced. In this work, we evaluate the new features on two platforms that feature different CPUs, GPUs, and interconnects. We derive a benchmark suite for the experiments and stress the memory system to evaluate both in-memory and oversubscription performance. The results show that memory advises on the Intel-Volta/Pascal- PCIe platform bring negligible improvement for in-memory exe- cutions. However, when GPU memory is oversubscribed by about 50%, using memory advises results in up to 25% performance improvement compared to the basic CUDA Unified Memory. In contrast, the Power9-Volta-NVLink platform can substantially benefit from memory advises, achieving up to 34% performance gain for in-memory executions. However, when GPU memory is oversubscribed on this platform, using memory advises increases GPU page faults and results in considerable performance loss. The CUDA prefetch also shows different performance impact on the two platforms. It improves performance by up to 50% on the Intel-Volta/Pascal-PCI-E platform but brings little benefit to the Power9-Volta-NVLink platform.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2019
Keywords
CUDA memory hints, CUDA Unified Memory, GPU, Memory oversubscription, UVM, Graphics processing unit, Memory architecture, Program processors, Benchmark suites, Memory systems, Performance Gain, Performance impact, Performance loss, Prefetches, Benchmarking
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-268024 (URN)10.1109/MCHPC49590.2019.00014 (DOI)2-s2.0-85078538028 (Scopus ID)9781728160078 (ISBN)
Conference
2019 IEEE/ACM Workshop on Memory Centric High Performance Computing, MCHPC@SC 2019, Denver, CO, USA, November 18, 2019
Note

QC 20200327

Available from: 2020-03-27 Created: 2020-03-27 Last updated: 2020-03-27Bibliographically approved
Narasimhamurthy, S., Danilov, N., Wu, S., Umanesan, G., Markidis, S., Rivas-Gomez, S., . . . de Witt, S. (2019). SAGE: Percipient Storage for Exascale Data Centric Computing. Parallel Computing, 83, 22-33
Open this publication in new window or tab >>SAGE: Percipient Storage for Exascale Data Centric Computing
Show others...
2019 (English)In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 83, p. 22-33Article in journal (Refereed) Published
Abstract [en]

We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infrastructure. SAGE addresses the increasing overlaps between Big Data Analysis and HPC in an era of next-generation data centric computing that has developed due to the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors, whose data needs to be processed, analysed and integrated into simulations to derive scientific and innovative insights. Indeed, Exascale I/O, as a problem that has not been sufficiently dealt with for simulation codes, is appropriately addressed by the SAGE platform. The objective of this paper is to discuss the software architecture of the SAGE system and look at early results we have obtained employing some of its key methodologies, as the system continues to evolve.

Place, publisher, year, edition, pages
Elsevier, 2019
Keywords
SAGE architecture, Object storage, Mero, Clovis, PGAS I/O, MPI I/O, MPI streams
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-254119 (URN)10.1016/j.parco.2018.03.002 (DOI)000469898400003 ()2-s2.0-85044917976 (Scopus ID)
Note

QC 20190624

Available from: 2019-06-24 Created: 2019-06-24 Last updated: 2019-06-24Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-0639-0639

Search in DiVA

Show all publications