kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Araújo De Medeiros, DanielORCID iD iconorcid.org/0000-0002-1434-3042
Publications (10 of 15) Show all publications
Williams, J. J., Costea, S., Araújo De Medeiros, D., Trilaksono, J., Hegde, P. R., Tskhakaya, D., . . . Markidis, S. (2026). Integrating High Performance In-Memory Data Streaming and In-Situ Visualization in Hybrid MPI+OpenMP PIC MC Simulations Towards Exascale. The international journal of high performance computing applications
Open this publication in new window or tab >>Integrating High Performance In-Memory Data Streaming and In-Situ Visualization in Hybrid MPI+OpenMP PIC MC Simulations Towards Exascale
Show others...
2026 (English)In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846Article in journal (Refereed) Epub ahead of print
Abstract [en]

Efficient simulation of complex plasma dynamics is crucial for advancing fusion energy research. Particle-in-Cell (PIC) Monte Carlo (MC) simulations provide insights into plasma behavior, including turbulence and confinement, which are essential for optimizing fusion reactor performance. Transitioning to exascale simulations introduces significant challenges, with traditional file input/output (I/O) inefficiencies remaining a key bottleneck. This work advances BIT1, an electrostatic PIC MC code, by improving the particle mover with OpenMP task-based parallelism, integrating openPMD’s streaming API, and enabling in-memory data streaming with the ADIOS2 Sustainable Staging Transport (SST) engine to enhance I/O performance, computational efficiency, and system storage utilization. We employ profiling tools such as gprof, perf, IPM and Darshan, which provide insights into computation, communication, and I/O operations. We implement time-dependent data checkpointing with the openPMD API enabling seamless data movement and in-situ visualization for real-time analysis without interrupting the simulation. We demonstrate improvements in simulation runtime, data accessibility and real-time insights by comparing traditional file I/O with the ADIOS2 BP4 and SST backends. The proposed hybrid BIT1 openPMD SST enhancement introduces a new paradigm for real-time scientific discovery in plasma simulations, enabling faster insights and more efficient use of exascale computing resources.

Place, publisher, year, edition, pages
London, United Kingdom: Sage Publications, 2026
Keywords
Hybrid MPI+OMP Parallel Programming, openPMD, ADIOS2, In-Memory Data Streaming, In-situ Visualization, Distributed Computing, Efficient Data Processing, Large-Scale PIC MC Simulations
National Category
Fusion, Plasma and Space Physics Computer Sciences Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-373650 (URN)10.1177/10943420251409229 (DOI)001667829900001 ()2-s2.0-105028315172 (Scopus ID)
Note

QC 20260204

Available from: 2025-12-04 Created: 2025-12-04 Last updated: 2026-02-04Bibliographically approved
Araújo De Medeiros, D., Williams, J. J., Wahlgren, J., Saud Maia Leite, L. & Peng, I. B. (2025). ARC-V: Vertical Resource Adaptivity for HPC Workloads in Containerized Environments. In: 31st International European Conference on Parallel and Distributed Computing: . Paper presented at The 31st International European Conference on Parallel and Distributed Computing (Euro-Par ’25), Dresden, Germany, 25-29 Aug, 2025. Springer Nature
Open this publication in new window or tab >>ARC-V: Vertical Resource Adaptivity for HPC Workloads in Containerized Environments
Show others...
2025 (English)In: 31st International European Conference on Parallel and Distributed Computing, Springer Nature , 2025Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Vertical scaling, HPC workloads, Cloud Computing, Resource Adaptivity, Memory Resource Provisioning
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-363170 (URN)10.1007/978-3-031-99854-6_12 (DOI)2-s2.0-105015430232 (Scopus ID)
Conference
The 31st International European Conference on Parallel and Distributed Computing (Euro-Par ’25), Dresden, Germany, 25-29 Aug, 2025
Note

QC 20250923

Available from: 2025-05-06 Created: 2025-05-06 Last updated: 2025-09-23Bibliographically approved
Araújo De Medeiros, D. (2025). Towards Adaptive Resource Management for HPC Workloads in Cloud Environments. (Doctoral dissertation). KTH Royal Institute of Technology
Open this publication in new window or tab >>Towards Adaptive Resource Management for HPC Workloads in Cloud Environments
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Maximizing resource efficiency is crucial when designing cloud-based systems,which are primarily built to meet specific quality-of-service requirements.Common optimization techniques include containerization, workflow orchestration,elasticity, and vertical scaling, all aimed at improving resource utilizationand reducing costs. In contrast, on-premises high-performance computingsystems prioritize maximum performance, typically relying on static resourceallocation. While this approach offers certain advantages over cloud systems,it can be restrictive in handling the increasingly dynamic resource demands oftightly coupled HPC workloads, making adaptive resource management challenging.

This thesis explores the execution of high-performance workloads in cloudbasedenvironments, investigating both horizontal and vertical scaling strategiesas well as the feasibility of running HPC workflows in the cloud. Additionally,we will evaluate the costs of deploying these workloads in containerizedenvironments and examine the advantages of using object storagein cloud-based HPC systems.

Abstract [sv]

Att maximera resurseffektiviteten ar avgörande vid utformningen av molnbaserade system, som framst byggs för att uppfylla specifika krav på tjänstekvalitet. Vanliga optimeringstekniker inkluderar containerisering, arbetsflödesorkestrering, elasticitet och vertikal skalning, med målet att förbättra resursutnyttjandet och minska kostnaderna. I kontrast fokuserar lokala högprestandaberäkningssystem (HPC) på maximal prestanda och förlitar sig oftast på statisk resursallokering. Även om denna strategi har vissa fördelar jämfört med molnlösningar, kan den vara begränsande när det gäller att hantera de allt mer dynamiska resursbehoven hos tätt sammankopplade HPC-arbetslaster, vilket gör adaptiv resursförvaltning utmanande. Denna avhandling undersöker körningen av högprestandaarbetslaster i molnbaserade miljöer, med fokus på både horisontell och vertikal skalning samt möjligheten att köra HPC-arbetsflöden i molnet. Dessutom kommer vi att analysera kostnaderna for att distribuera dessa arbetslaster i containeriserade miljöer och utvärdera fördelarna med att använda objektlagring i molnbaserade HPC-system.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2025. p. 91
Series
TRITA-EECS-AVL ; 2025:51
Keywords
high-performance computing, resource adaptability, cloud computing, containers, horizontal scaling, vertical scaling, object storage, Högprestandaberäkning, resursanpassningsförmåga, molnberäkning, containerisering, horisontell skalning, vertikal skalning, objektlagring
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-363164 (URN)978-91-8106-279-3 (ISBN)
Public defence
2025-06-02, E2, Lindstedtsvägen 3, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20250506

Available from: 2025-05-06 Created: 2025-05-06 Last updated: 2025-05-06Bibliographically approved
Araújo De Medeiros, D., Schieffer, G., Wahlgren, J. & Peng, I. B. (2025). Understanding Layered Portability from HPC to Cloud in Containerized Environments. In: Weiland, M Neuwirth, S Kruse, C Weinzierl, T (Ed.), Proceedings High Performance Computing. ISC High Performance 2024 International Workshops: . Paper presented at 39th International Conference of the ISC High Performance, MAY 12-16, 2024, Hamburg, GERMANY (pp. 439-452). Springer Nature, 15058
Open this publication in new window or tab >>Understanding Layered Portability from HPC to Cloud in Containerized Environments
2025 (English)In: Proceedings High Performance Computing. ISC High Performance 2024 International Workshops / [ed] Weiland, M Neuwirth, S Kruse, C Weinzierl, T, Springer Nature , 2025, Vol. 15058, p. 439-452Conference paper, Published paper (Refereed)
Abstract [en]

Recent development in lightweight OS-level virtualization, containers, provides a potential solution for running HPC applications on the cloud platform. In this work, we focus on the impact of different layers in a containerized environment when migrating HPC containers from a dedicated HPC system to a cloud platform. On three ARM-based platforms, including the latest Nvidia Grace CPU, we use six representative HPC applications to characterize the impact of container virtualization, host OS and kernel, and rootless and privileged container execution. Our results indicate less than 4% container overhead in DGEMM, miniMD, and XSBench, but 8%-10% overhead in FFT, HPCG, and Hypre. We also show that changing between the container execution modes results in negligible performance differences in the six applications.

Place, publisher, year, edition, pages
Springer Nature, 2025
Series
Lecture Notes in Computer Science, ISSN 0302-9743
Keywords
Cloud and HPC Convergence, Containers, ARM, Performance
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-366126 (URN)10.1007/978-3-031-73716-9_31 (DOI)001463189500031 ()2-s2.0-105009319340 (Scopus ID)
Conference
39th International Conference of the ISC High Performance, MAY 12-16, 2024, Hamburg, GERMANY
Note

Part of ISBN 978-3-031-73715-2, 978-3-031-73716-9

QC 20250703

Available from: 2025-07-03 Created: 2025-07-03 Last updated: 2025-07-10Bibliographically approved
Schieffer, G., Pornthisan, N., Araújo De Medeiros, D., Markidis, S., Wahlgren, J. & Peng, I. B. (2024). Boosting the Performance of Object Tracking with a Half-Precision Particle Filter on GPU. In: Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28 – September 1, 2023, Revised Selected Papers: . Paper presented at International workshops held at the 29th International Conference on Parallel and Distributed Computing, Euro-Par 2023, Aug 28 2023 - Sep 1 2023 Limassol, Cyprus (pp. 294-305). Springer Nature
Open this publication in new window or tab >>Boosting the Performance of Object Tracking with a Half-Precision Particle Filter on GPU
Show others...
2024 (English)In: Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28 – September 1, 2023, Revised Selected Papers, Springer Nature , 2024, p. 294-305Conference paper, Published paper (Refereed)
Abstract [en]

High-performance GPU-accelerated particle filter methods are critical for object detection applications, ranging from autonomous driving, robot localization, to time-series prediction. In this work, we investigate the design, development and optimization of particle-filter using half-precision on CUDA cores and compare their performance and accuracy with single- and double-precision baselines on Nvidia V100, A100, A40 and T4 GPUs. To mitigate numerical instability and precision losses, we introduce algorithmic changes in the particle filters. Using half-precision leads to a performance improvement of 1.5–2 × and 2.5–4.6 × with respect to single- and double-precision baselines respectively, at the cost of a relatively small loss of accuracy.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
GPUs, Half-Precision, Particle Filter, Reduced Precision
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-346540 (URN)10.1007/978-3-031-50684-0_23 (DOI)001279250600023 ()2-s2.0-85192268315 (Scopus ID)
Conference
International workshops held at the 29th International Conference on Parallel and Distributed Computing, Euro-Par 2023, Aug 28 2023 - Sep 1 2023 Limassol, Cyprus
Note

Part of proceedings ISBN: 978-303150683-3

QC 20240520

Available from: 2024-05-16 Created: 2024-05-16 Last updated: 2024-09-10Bibliographically approved
Williams, J. J., Araújo De Medeiros, D., Costea, S., Tskhakaya, D., Poeschel, F., Widera, R., . . . Markidis, S. (2024). Enabling High-Throughput Parallel I/O in Particle-in-Cell Monte Carlo Simulations with openPMD and Darshan I/O Monitoring. In: 2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops): . Paper presented at 2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), September 24-27, Kobe, Japan. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Enabling High-Throughput Parallel I/O in Particle-in-Cell Monte Carlo Simulations with openPMD and Darshan I/O Monitoring
Show others...
2024 (English)In: 2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), Institute of Electrical and Electronics Engineers (IEEE) , 2024Conference paper, Published paper (Refereed)
Abstract [en]

Large-scale HPC simulations of plasma dynamics in fusion devices require efficient parallel I/O to avoid slowing down the simulation and to enable the post-processing of critical information. Such complex simulations lacking parallel I/O capabilities may encounter performance bottlenecks, hindering their effectiveness in data-intensive computing tasks. In this work, we focus on introducing and enhancing the efficiency of parallel I/O operations in Particle-in-Cell Monte Carlo simulations. We first evaluate the scalability of BIT1, a massively-parallel electrostatic PIC MC code, determining its initial write throughput capabilities and performance bottlenecks using an HPC I/O performance monitoring tool, Darshan. We design and develop an adaptor to the openPMD I/O interface that allows us to stream PIC particle and field information to I/O using the BP4 backend, aggressively optimized for I/O efficiency, including the highly efficient ADIOS2 interface. Next, we explore advanced optimization techniques such as data compression, aggregation, and Lustre file striping, achieving write throughput improvements while enhancing data storage efficiency. Finally, we analyze the enhanced high-throughput parallel I/O and storage capabilities achieved through the integration of openPMD with rapid metadata extraction in BP4 format. Our study demonstrates that the integration of openPMD and advanced I/O optimizations significantly enhances BIT1's I/O performance and storage capabilities, successfully introducing high throughput parallel I/O and surpassing the capabilities of traditional file I/O.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
openPMD, Darshan, ADIOS2, Parallel I/O, Efficient Data Processing, Distributed Storage, Large-Scale PIC Simulations
National Category
Computer Sciences Fusion, Plasma and Space Physics Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-351318 (URN)10.1109/CLUSTERWorkshops61563.2024.00022 (DOI)001422214200011 ()2-s2.0-85204321286 (Scopus ID)
Conference
2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), September 24-27, Kobe, Japan
Note

QC 20241113

Part of ISBN 979-8-3503-8346-1

Available from: 2024-08-07 Created: 2024-08-07 Last updated: 2025-03-24Bibliographically approved
Araújo De Medeiros, D., Markidis, S., Denier, P. & et al., . (2024). IO-SEA: Storage I/O and Data Management for Exascale Architectures. In: Proceedings of the 21st ACM International Conference on Computing Frontiers 2024 Workshops and Special Sessions, CF 2024 Companion: . Paper presented at 21st ACM International Conference on Computing Frontiers, CF 2024, Ischia, Italy, May 7 2024 - May 9 2024 (pp. 94-100). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>IO-SEA: Storage I/O and Data Management for Exascale Architectures
2024 (English)In: Proceedings of the 21st ACM International Conference on Computing Frontiers 2024 Workshops and Special Sessions, CF 2024 Companion, Association for Computing Machinery (ACM) , 2024, p. 94-100Conference paper, Published paper (Refereed)
Abstract [en]

The new emerging scientific workloads to be executed in the upcoming exascale supercomputers face major challenges in terms of storage, given their extreme volume of data. In particular, intelligent data placement, instrumentation, and workflow handling are central to application performance. The IO-SEA project developed multiple solutions to aid the scientific community in adressing these challenges: a Workflow Manager, a hierarchical storage management system, and a semantic API for storage. All of these major products incorporate additional minor products that support their mission. In this paper, we discuss both the roles of all these products and how they can assist the scientific community in achieving exascale performance.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
data movement, exascale, hierarchical storage, semantic interface, storage, workflows
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-351498 (URN)10.1145/3637543.3654620 (DOI)001267269200021 ()2-s2.0-85199181981 (Scopus ID)
Conference
21st ACM International Conference on Computing Frontiers, CF 2024, Ischia, Italy, May 7 2024 - May 9 2024
Note

QC 20240822

Available from: 2024-08-22 Created: 2024-08-22 Last updated: 2024-09-24Bibliographically approved
Schieffer, G., Araújo De Medeiros, D., Faj, J., Marathe, A. & Peng, I. B. (2024). On the Rise of AMD Matrix Cores: Performance, Power Efficiency, and Programmability. In: Proceedings - 2024 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2024: . Paper presented at 2024 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2024, Indianapolis, United States of America, May 5 2024 - May 7 2024 (pp. 132-143). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>On the Rise of AMD Matrix Cores: Performance, Power Efficiency, and Programmability
Show others...
2024 (English)In: Proceedings - 2024 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 132-143Conference paper, Published paper (Refereed)
Abstract [en]

Matrix multiplication is a core computational part of deep learning and scientific workloads. The emergence of Matrix Cores in high-end AMD GPUs, a building block of Exascale computers, opens new opportunities for optimizing the performance and power efficiency of compute-intensive applications. This work provides a timely, comprehensive characterization of the novel Matrix Cores in AMD GPUs. We develop low-level micro-benchmarks for leveraging Matrix Cores at different levels of parallelism, achieving up to 350, 88, and 69 TFLOPS for mixed, float, and double precision on one GPU. Using results obtained from the micro-benchmarks, we provide a performance model of Matrix Cores that can guide application developers in performance tuning. We also provide the first quantitative study and modeling of the power efficiency of Matrix Cores at different floating-point data types. Finally, we evaluate the high- level programmability of Matrix Cores through the rocBLAS library in a wide range of matrix sizes from 16 to 64K. Our results indicate that application developers can transparently leverage Matrix Cores to deliver more than 92% peak computing throughput by properly selecting data types and interfaces.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
AI00, AMD GPU, AMD MI250, Matrix Core, Tensor Core
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-351749 (URN)10.1109/ISPASS61541.2024.00022 (DOI)001486977800012 ()2-s2.0-85199903360 (Scopus ID)
Conference
2024 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2024, Indianapolis, United States of America, May 5 2024 - May 7 2024
Note

Part of ISBN [9798350376388]

QC 20240823

Available from: 2024-08-13 Created: 2024-08-13 Last updated: 2025-12-08Bibliographically approved
Peng, I. B., Schulz, M., Haus, U. U., Prunty, C., Marcuello, P., Danovaro, E., . . . Markidis, S. (2024). OpenCUBE: Building an Open Source Cloud Blueprint with EPI Systems. In: Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, 2023, Revised Selected Papers. Paper presented at International workshops held at the 29th International Conference on Parallel and Distributed Computing, Euro-Par 2023, Limassol, Cyprus, Aug 28 2023 - Sep 1 2023 (pp. 260-264). Springer Nature
Open this publication in new window or tab >>OpenCUBE: Building an Open Source Cloud Blueprint with EPI Systems
Show others...
2024 (English)In: Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, 2023, Revised Selected Papers, Springer Nature , 2024, p. 260-264Conference paper, Published paper (Refereed)
Abstract [en]

OpenCUBE aims to develop an open-source full software stack for Cloud computing blueprint deployed on EPI hardware, adaptable to emerging workloads across the computing continuum. OpenCUBE prioritizes energy awareness and utilizes open APIs, Open Source components, advanced SiPearl Rhea processors, and RISC-V accelerator. The project leverages representative workloads, such as cloud-native workloads and workflows of weather forecast data management, molecular docking, and space weather, for evaluation and validation.

Place, publisher, year, edition, pages
Springer Nature, 2024
Series
Lecture Notes in Computer Science
Keywords
Computing continuum, Converged HPC and Cloud, EPI, Open-source, RISC-V
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-346145 (URN)10.1007/978-3-031-48803-0_29 (DOI)001279248600029 ()2-s2.0-85190984237 (Scopus ID)
Conference
International workshops held at the 29th International Conference on Parallel and Distributed Computing, Euro-Par 2023, Limassol, Cyprus, Aug 28 2023 - Sep 1 2023
Note

Part of proceedings ISBN: 978-303148802-3

QC 20240506

Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2024-09-10Bibliographically approved
Araújo De Medeiros, D., Schieffer, G., Wahlgren, J. & Peng, I. (2024). Understanding Layered Portability from HPC to Cloud in Containerized Environments. In: Proceedings of the International Supercomputing Conference, Workshops: . Paper presented at International Workshop on Converged Computing on Edge, Cloud, and HPC (WOCC ’24.
Open this publication in new window or tab >>Understanding Layered Portability from HPC to Cloud in Containerized Environments
2024 (English)In: Proceedings of the International Supercomputing Conference, Workshops, 2024Conference paper, Published paper (Refereed)
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-363169 (URN)
Conference
International Workshop on Converged Computing on Edge, Cloud, and HPC (WOCC ’24
Note

QC 20250507

Available from: 2025-05-06 Created: 2025-05-06 Last updated: 2025-05-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-1434-3042

Search in DiVA

Show all publications