kth.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 23) Show all publications
Flatken, M., Podobas, A., Chien, W. D., Markidis, S., Gerndt, A. & et al., . (2023). VESTEC: Visual Exploration and Sampling Toolkit for Extreme Computing. IEEE Access, 11, 87805-87834
Open this publication in new window or tab >>VESTEC: Visual Exploration and Sampling Toolkit for Extreme Computing
Show others...
2023 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 11, p. 87805-87834Article in journal (Refereed) Published
Abstract [en]

Natural disasters and epidemics are unfortunate recurring events that lead to huge societal and economic loss. Recent advances in supercomputing can facilitate simulations of such scenarios in (or even ahead of) real-time, therefore supporting the design of adequate responses by public authorities. By incorporating high-velocity data from sensors and modern high-performance computing systems, ensembles of simulations and advanced analysis enable urgent decision-makers to better monitor the disaster and to employ necessary actions (e.g., to evacuate populated areas) for mitigating these events. Unfortunately, frameworks to support such versatile and complex workflows for urgent decision-making are only rarely available and often lack in functionalities. This paper gives an overview of the VESTEC project and framework, which unifies orchestration, simulation, in-situ data analysis, and visualization of natural disasters that can be driven by external sensor data or interactive intervention by the user. We show how different components interact and work together in VESTEC and describe implementation details. To disseminate our experience three different types of disasters are evaluated: a Wildfire in La Jonquera (Spain), a Mosquito-Borne disease in two regions of Italy, and the magnetic reconnection in the Earth magnetosphere.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
decision making, ensemble simulation, high-performance computing, in-situ processing, interactive data processing, Scientific visualization, topological data analysis
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-338517 (URN)10.1109/ACCESS.2023.3301177 (DOI)001093869300001 ()2-s2.0-85166747303 (Scopus ID)
Note

QC 20231114

Available from: 2023-11-14 Created: 2023-11-14 Last updated: 2024-02-29Bibliographically approved
Atzori, M., Köpp, W., Chien, W. D., Massaro, D., Mallor, F., Peplinski, A., . . . Weinkauf, T. (2022). In situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst. Journal of Supercomputing, 78(3), 3605-3620
Open this publication in new window or tab >>In situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst
Show others...
2022 (English)In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 78, no 3, p. 3605-3620Article in journal (Refereed) Published
Abstract [en]

In situ visualization on high-performance computing systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We develop an in situ adaptor for Paraview Catalyst and Nek5000, a massively parallel Fortran and C code for computational fluid dynamics. We perform a strong scalability test up to 2048 cores on KTH’s Beskow Cray XC40 supercomputer and assess in situ visualization’s impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ≈ 21 % on 2048 cores (the relative efficiency of Nek5000 without in situ operations is ≈ 99 %). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses the Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in situ processing time between rank 0 and all other ranks. In our case, better scaling and load-balancing in the parallel image composition would considerably improve the performance of Nek5000 with in situ capabilities. In general, the result of this study highlights the technical challenges posed by the integration of high-performance simulation codes and data-analysis libraries and their practical use in complex cases, even when efficient algorithms already exist for a certain application scenario.

Place, publisher, year, edition, pages
Springer, 2022
Keywords
Computational fluid dynamics, High-performance computing, In situ visualization, Catalysts, Data visualization, Efficiency, Image enhancement, Scalability, Supercomputers, Visualization, Application scenario, High performance computing systems, High-fidelity simulations, High-performance simulation, Large scale turbulence, Parallel efficiency, Relative efficiency, Technical challenges, In situ processing
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-311178 (URN)10.1007/s11227-021-03990-3 (DOI)000680293400003 ()35210696 (PubMedID)2-s2.0-85111797526 (Scopus ID)
Note

QC 20220502

Available from: 2022-05-02 Created: 2022-05-02 Last updated: 2024-01-19Bibliographically approved
Chien, W. D. (2022). Large-scale I/O Models for Traditional and Emerging HPC Workloads on Next-Generation HPC Storage Systems. (Doctoral dissertation). Stockholm: Kungliga Tekniska högskolan
Open this publication in new window or tab >>Large-scale I/O Models for Traditional and Emerging HPC Workloads on Next-Generation HPC Storage Systems
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The ability to create value from large-scale data is now an essential part of research and driving technological development everywhere from everyday technology to life-saving medical applications. In almost all scientific fields that require handling large-scale data, such as weather forecast, physics simulation, and computational biology, supercomputers (HPC systems) have emerged as an essential tool for implementing and solving problems. While the computational speed of supercomputers has grown rapidly, the methods for handling large-scale data I/O (reading and writing data) at a high pace have not evolved as much. POSIX-based Parallel File Systems (PFS) and programming interfaces such as MPI-IO remain the norm of I/O workflow in HPC. At the same time, new applications, such as big data, and Machine Learning (ML) have emerged as a new class of widely deployed HPC applications. While all these applications require the ingestion and output of a large amount of data, they have very different usage patterns, giving a different set of requirements. Apart from that, new I/O technologies on HPC such as fast burst buffers and object stores are increasingly available. It currently lacks a novel method to fully exploit them in HPC applications.

In this thesis, we evaluate modern storage infrastructures, the I/O programming model landscape, and characterize how HPC applications can take advantage of these I/O models to tackle bottlenecks. In particular, we look into object storage, a promising technology that has the potential of replacing existing I/O subsystems for large-scale data storage. Firstly, we mimic the object storage semantic and create an emulator on top of existing parallel file systems to project the performance improvement that can be expected on a real object store for HPC applications. Secondly, we develop a programming model that supports numerical data storage for scientific applications. The set of interfaces captures the need from parallel applications that use domain decomposition. Finally, we evaluate how the interfaces can be used by scientific applications. More specifically, we show for the first time, how our programming interface can be used to leverage Seagate's Motr object-store. Aside from that, we also showcase how this approach can enable the use of modern node-local hierarchical storage architectures.

Aside from advancement on I/O infrastructure, the wide deployment of modern ML workloads introduces unique challenges to HPC and its I/O systems. We first understand the challenges by focusing on a state-of-the-art Deep-Learning (DL) framework called TensorFlow, which is widely used in cloud platforms. We evaluate how data ingestion in TensorFlow differs from traditional HPC applications to understand the challenges. While TensorFlow focuses on DL applications, there are alternative learning methods that pose different sets of challenges. To complement our understanding, we also propose a framework called StreamBrain, which implements a brain-like learning algorithm called the Bayesian Confidence Propagation Neural Network (BCPNN). We find that these alternative methods can potentially impose an even bigger challenge to conventional learning (such as those present in TensorFlow). To explain the I/O behavior of DL training, we perform a series of measurements and profiling on TensorFlow using monitoring tools. However, we find that existing methods are insufficient to derive a fine-grained I/O characteristic on these modern frameworks due to a lack of application-level coupling. To tackle this challenge, we propose a system called tf-Darshan that combines traditional HPC I/O monitoring and an ML workload profiling to enable a fine-grained I/O performance evaluation. Our findings show that the lack of co-design between modern frameworks and the HPC I/O subsystem leads to inefficient I/O (e.g. very small and random reads). They also fail to coordinate I/O requests in an efficient way in a parallel environment. With tf-Darshan, we showcase how knowledge derived from such measurements can be used to explain and improve I/O performance. Some examples include selective data staging to fast storage, and future auto-tuning on I/O parameters.

The methods proposed in this thesis are evaluated on a variety of HPC systems, workstations, and prototype systems with different I/O and compute architectures. Different HPC applications are used to validate the approaches. The experiments show that our approaches can enable a good characterization of I/O performance, and our proposed programming model illustrates how applications can use next-generation storage systems.

Abstract [sv]

Förmågan att skapa värde från stora datamängder är nu en väsentlig del av forskningen och driver teknisk utveckling överallt från vardagsteknik till livräddande medicinska tillämpningar. Inom nästan alla vetenskapliga områden som kräver hantering av stora datamängder, såsom väderprognos, fysiksimulering och beräkningsbiologi, har superdatorer (HPC-system) dykt upp som ett viktigt verktyg för att implementera och lösa problem. Medan superdatorernas beräkningshastighet har vuxit snabbt, har metoderna för att hantera storskalig data-I/O (läsa och skriva data) i hög takt inte utvecklats i samma takt. POSIX-baserade parallella filsystem (PFS) och programmeringsgränssnitt som MPI-IO förblir normen för I/O-arbetsflöden i HPC. Samtidigt har nya applikationer, såsom big data, och Machine Learning (ML) vuxit fram som en ny, vida använd, klass av HPC-program. Även om alla dessa applikationer kräver intag och utmatning av en stor mängd data, har de väldigt olika användningsmönster, vilket ger en annan uppsättning krav. Bortsett från det blir nya I/O-tekniker på HPC som snabba burst-buffertar (burst buffers) och objektlager (object stores) alltmer tillgängliga. Dessa saknar för närvarande en ny metod för att utnyttja deras fulla potential.

I denna avhandling utvärderar vi moderna lagringsinfrastrukturer, I/O-programmeringsmodell och karakteriserar hur HPC-applikationer kan dra nytta av dessa I/O-modeller för att förbättra prestanda. Vi tittar särskilt på objektlagring som är en lovande teknik som har potential att ersätta befintliga I/O-delsystem för storskalig datalagring. Vi utvecklar en method som härmar vi objektlagringssemantiken och skapar en emulator ovanpå befintliga parallella filsystem för att projicera den prestandaförbättring som kan förväntas på en riktig objektlagring för HPC-applikationer. Efter det så utvecklar vi en programmeringsmodell som stöder numerisk datalagring för vetenskapliga tillämpningar. Uppsättningen av gränssnitt fångar behovet från parallella applikationer som använder domändekomposition. Slutligen utvärderar vi hur gränssnitten kan användas av vetenskapliga tillämpningar. Mer specifikt visar vi för första gången hur vårt programmeringsgränssnitt kan användas för att dra nytta av Seagates Motor-objektlager. Dessutom visar vi också hur detta tillvägagångssätt kan möjliggöra användningen av moderna nodlokala hierarkiska lagringsarkitekturer.

Bortsett från framsteg inom I/O-infrastruktur, introducerar den breda distributionen av moderna ML-arbetsbelastningar unika utmaningar för HPC och dess I/O-system. Vi börjar med att först förstå vilka utmaningar som finns genom att fokusera på ett toppmodernt Deep-Learning (DL) ramverk kallat TensorFlow, som används flitigt på molnplattformar. Vi utvärderar hur dataintag i TensorFlow skiljer sig från traditionella HPC-applikationer för att förstå utmaningarna. Medan TensorFlow fokuserar på DL-applikationer, finns det alternativa inlärningsmetoder som bär med sig olika uppsättningar av utmaningar. För att komplettera vår förståelse föreslår vi också ett ramverk som heter StreamBrain. StreamBrain implementerar en hjärnliknande inlärningsalgoritm som kallas Bayesian Confidence Propagation Neural Network (BCPNN). Vi finner att dessa alternativa metoder potentiellt kan innebära en ännu större utmaning för konventionellt lärande (som de som finns i TensorFlow). För att förklara I/O-beteendet för DL-träning utför vi en serie mätningar och profilering på TensorFlow med hjälp av profileringsverktyg. Vi finner dock att befintliga metoder är otillräckliga för att härleda en finkornig I/O-karaktäristik på dessa moderna ramverk på grund av bristen av koppling på applikationsnivå. För att tackla denna utmaning föreslår vi ett system som heter tf-Darshan som kombinerar traditionell HPC I/O-övervakning och en ML-arbetsbelastningsprofilering för att möjliggöra en finkornig utvärdering av I/O-prestanda. Våra resultat visar att bristen på samdesign mellan moderna ramverk och HPC I/O-delsystemet leder till ineffektiv I/O (t.ex. mycket små och slumpmässiga läsningar). De misslyckas också med att koordinera I/O-förfrågningar på ett effektivt sätt i en parallell miljö. Med tf-Darshan visar vi hur kunskap som härrör från sådana mätningar kan användas för att förklara och förbättra I/O-prestanda. Några exempel inkluderar selektiv datainställning till snabb lagring och framtida automatisk justering av I/O-parametrar.

Metoderna som föreslås i denna avhandling utvärderas på en mängd olika HPC-system, arbetsstationer och prototypsystem med olika I/O- och beräkni-ngsarkitekturer. Olika HPC-applikationer används för att validera tillvägagån-gssätten. Experimenten visar att våra tillvägagångssätt kan möjliggöra en bra karakterisering av I/O-prestanda, och vår föreslagna programmeringsmodell illustrerar hur applikationer kan använda nästa generations lagringssystem.

Place, publisher, year, edition, pages
Stockholm: Kungliga Tekniska högskolan, 2022. p. 111
Series
TRITA-EECS-AVL ; 2022:25
Keywords
HPC, I/O, Parallel I/O, MPI, object storage, TensorFlow, Data-Centric, Artificial Intelligence, Machine Learning
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-310828 (URN)978-91-8040-191-3 (ISBN)
Public defence
2022-04-29, F3, 15:00 (English)
Opponent
Supervisors
Funder
EU, Horizon 2020, 800999
Note

QC 20220411

Available from: 2022-04-11 Created: 2022-04-07 Last updated: 2022-06-25Bibliographically approved
Chien, W. D., Podobas, A., Svedin, M., Tkachuk, A., El Sayed, S., Herman, P., . . . Markidis, S. (2022). NoaSci: A Numerical Object Array Library for I/O of Scientific Applications on Object Storage. In: 2022 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing: . Paper presented at 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>NoaSci: A Numerical Object Array Library for I/O of Scientific Applications on Object Storage
Show others...
2022 (English)In: 2022 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Institute of Electrical and Electronics Engineers (IEEE) , 2022Conference paper, Published paper (Refereed)
Abstract [en]

The strong consistency and stateful workflow are seen as the major factors for limiting parallel I/O performance because of the need for locking and state management. While the POSIX-based I/O model dominates modern HPC storage infrastructure, emerging object storage technology can potentially improve I/O performance by eliminating these bottlenecks.Despite a wide deployment on the cloud, its adoption in HPCremains low. We argue one reason is the lack of a suitable programming interface for parallel I/O in scientific applications. In this work, we introduce NoaSci, a Numerical Object Arraylibrary for scientific applications. NoaSci supports different data formats (e.g. HDF5, binary), and focuses on supporting node-local burst buffers and object stores. We demonstrate for the first time how scientific applications can perform parallel I/Oon Seagate’s Motr object store through NoaSci. We evaluate NoaSci’s preliminary performance using the iPIC3D spaceweather application and position against existing I/O methods.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
Euromicro Conference on Parallel, Distributed and Network-Based Processing, ISSN 1066-6192
Keywords
Object-stores, Parallel I/O for Object Stores
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-310827 (URN)10.1109/PDP55904.2022.00034 (DOI)000827652300025 ()2-s2.0-85129620089 (Scopus ID)
Conference
30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)
Note

Part of proceedings: ISBN 978-1-6654-6958-6

QC 20220826

Available from: 2022-04-07 Created: 2022-04-07 Last updated: 2023-01-16Bibliographically approved
Brown, N., Nash, R., Gibb, G., Belikov, E., Podobas, A., Chien, W. D., . . . Gerndt, A. (2022). Workflows to Driving High-Performance Interactive Supercomputing for Urgent Decision Making. In: Anzt, H Bienz, A Luszczek, P Baboulin, M (Ed.), High Performance Computing, Isc High Performance 2022 International Workshops: . Paper presented at 37th International Supercomputing Conference on High Performance Computing (ISC High Performance Computing), MAY 29-JUN 02, 2022, Hamburg, GERMANY (pp. 233-244). Springer Nature, 13387
Open this publication in new window or tab >>Workflows to Driving High-Performance Interactive Supercomputing for Urgent Decision Making
Show others...
2022 (English)In: High Performance Computing, Isc High Performance 2022 International Workshops / [ed] Anzt, H Bienz, A Luszczek, P Baboulin, M, Springer Nature , 2022, Vol. 13387, p. 233-244Conference paper, Published paper (Refereed)
Abstract [en]

Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; namely the users, the simulation codes, and external data sources, together in a structured and accessible manner. In this paper we explore the role of workflows from both the perspective of marshalling and control of urgent workloads, and at the individual HPC machine level. Ultimately requiring two workflow systems, by using a space weather prediction urgent use-cases, we explore the benefit that these two workflow systems provide especially when one exploits the flexibility enabled by them interoperating.

Place, publisher, year, edition, pages
Springer Nature, 2022
Series
Lecture Notes in Computer Science, ISSN 0302-9743
Keywords
Workflows, Interactive HPC, Urgent computing
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-326881 (URN)10.1007/978-3-031-23220-6_16 (DOI)000971487500017 ()2-s2.0-85148688296 (Scopus ID)
Conference
37th International Supercomputing Conference on High Performance Computing (ISC High Performance Computing), MAY 29-JUN 02, 2022, Hamburg, GERMANY
Note

QC 20230515

Available from: 2023-05-15 Created: 2023-05-15 Last updated: 2023-05-15Bibliographically approved
Svedin, M., Chien, W. D., Chikafa, G., Jansson, N. & Podobas, A. (2021). Benchmarking the Nvidia GPU Lineage: From Early K80 to Modern A100 with Asynchronous Memory Transfers. In: ACM International Conference Proceeding Series: . Paper presented at 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, HEART 2021, 21 June 2021 through 23 June 2021,Online, Germany.. Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Benchmarking the Nvidia GPU Lineage: From Early K80 to Modern A100 with Asynchronous Memory Transfers
Show others...
2021 (English)In: ACM International Conference Proceeding Series, Association for Computing Machinery (ACM) , 2021Conference paper, Published paper (Refereed)
Abstract [en]

For many, Graphics Processing Units (GPUs) provides a source of reliable computing power. Recently, Nvidia introduced its 9th generation HPC-grade GPUs, the Ampere 100 (A100), claiming significant performance improvements over previous generations, particularly for AI-workloads, as well as introducing new architectural features such as asynchronous data movement. But how well does the A100 perform on non-AI benchmarks, and can we expect the A100 to deliver the application improvements we have grown used to with previous GPU generations? In this paper, we benchmark the A100 GPU and compare it to four previous generations of GPUs, with a particular focus on empirically quantifying our derived performance expectations. We find that the A100 delivers less performance increase than previous generations for the well-known Rodinia benchmark suite; we show that some of these performance anomalies can be remedied through clever use of the new data-movement features, which we microbenchmark and demonstrate where (and more importantly, how) they should be used.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2021
Keywords
Benchmarking, Computer graphics, Program processors, Architectural features, Asynchronous data, Benchmark suites, Data movements, Micro-benchmark, Performance anomaly, Performance expectations, Reliable computing, Graphics processing unit
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-310411 (URN)10.1145/3468044.3468053 (DOI)2-s2.0-85109396133 (Scopus ID)
Conference
11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, HEART 2021, 21 June 2021 through 23 June 2021,Online, Germany.
Note

Part of proceedings ISBN: 978-1-4503-8549-7

QC 20220331

Available from: 2022-03-31 Created: 2022-03-31 Last updated: 2024-03-15Bibliographically approved
Svedin, M., Podobas, A., Chien, S. W. & Markidis, S. (2021). Higgs Boson Classification: Brain-inspired BCPNN Learning with StreamBrain. In: 2021 IEEE International Conference On Cluster Computing (CLUSTER 2021): . Paper presented at IEEE International Conference on Cluster Computing (Cluster), SEP 07-10, 2021, ELECTR NETWORK (pp. 705-710). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Higgs Boson Classification: Brain-inspired BCPNN Learning with StreamBrain
2021 (English)In: 2021 IEEE International Conference On Cluster Computing (CLUSTER 2021), Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 705-710Conference paper, Published paper (Refereed)
Abstract [en]

One of the most promising approaches for data analysis and exploration of large data sets is Machine Learning (ML) techniques that are inspired by brain models. Such methods use alternative learning rules potentially more efficiently than established learning rules. In this work, we focus on the potential of brain-inspired ML for exploiting High-Performance Computing (HPC) resources to solve ML problems: we discuss the BCPNN and an HPC implementation, called StreamBrain, its computational cost, suitability to HPC systems. As an example, we use StreamBrain to analyze the Higgs Boson dataset from High Energy Physics and discriminate between background and signal classes in collisions of high-energy particle colliders. Overall, we reach up to 69.15% accuracy and 76.4% Area Under the Curve (AUC) performance.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Series
IEEE International Conference on Cluster Computing, ISSN 1552-5244
Keywords
Brain-inspired Machine Learning, BCPNN, Higgs Boson Dataset, High-Energy Physics
National Category
Subatomic Physics
Identifiers
urn:nbn:se:kth:diva-307012 (URN)10.1109/Cluster48925.2021.00105 (DOI)000728391000070 ()2-s2.0-85126060910 (Scopus ID)
Conference
IEEE International Conference on Cluster Computing (Cluster), SEP 07-10, 2021, ELECTR NETWORK
Note

Part of proceedings: ISBN 978-1-7281-9666-4, QC 20230118

Available from: 2022-01-12 Created: 2022-01-12 Last updated: 2023-01-18Bibliographically approved
Atzori, M., Köpp, W., Chien, W. D., Massaro, D., Mallor, F., Peplinski, A., . . . Weinkauf, T. (2021). In-situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst.
Open this publication in new window or tab >>In-situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst
Show others...
2021 (English)Report (Other academic)
Abstract [en]

In-situ visualization on HPC systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We design and develop in-situ visualization with Paraview Catalyst in Nek5000, a massively parallel Fortran and C code for computational fluid dynamics applications. We perform strong scalability tests up to 2,048 cores on KTH's Beskow Cray XC40 supercomputer and assess in-situ visualization's impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in-situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ~21\% on 2,048 cores (the relative efficiency of Nek5000 without in-situ operations is ~99\%). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in-situ processing time between rank 0 and all other ranks. Better scaling and load-balancing in the parallel image composition would considerably improve the performance and scalability of Nek5000 with in-situ capabilities in large-scale simulation.

National Category
Mechanical Engineering
Research subject
Engineering Mechanics
Identifiers
urn:nbn:se:kth:diva-295679 (URN)
Funder
Swedish Foundation for Strategic Research , BD15-0082European Commission, 800999 (SAGE2)
Note

QC 20210525

Available from: 2021-05-25 Created: 2021-05-25 Last updated: 2024-03-15Bibliographically approved
Podobas, A., Svedin, M., Chien, W. D., Peng, I. B., Ravichandran, N. B., Herman, P., . . . Markidis, S. (2021). StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs. In: ACM International Conference Proceeding Series: . Paper presented at 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, HEART 2021, 21 June 2021- 23 June 2021, Online, Germany.. Association for Computing Machinery (ACM)
Open this publication in new window or tab >>StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs
Show others...
2021 (English)In: ACM International Conference Proceeding Series, Association for Computing Machinery (ACM) , 2021Conference paper, Published paper (Refereed)
Abstract [en]

The modern deep learning method based on backpropagation has surged in popularity and has been used in multiple domains and application areas. At the same time, there are other - less-known - machine learning algorithms with a mature and solid theoretical foundation whose performance remains unexplored. One such example is the brain-like Bayesian Confidence Propagation Neural Network (BCPNN). In this paper, we introduce StreamBrain - a framework that allows neural networks based on BCPNN to be practically deployed in High-Performance Computing systems. StreamBrain is a domain-specific language (DSL), similar in concept to existing machine learning (ML) frameworks, and supports backends for CPUs, GPUs, and even FPGAs. We empirically demonstrate that StreamBrain can train the well-known ML benchmark dataset MNIST within seconds, and we are the first to demonstrate BCPNN on STL-10 size networks. We also show how StreamBrain can be used to train with custom floating-point formats and illustrate the impact of using different bfloat variations on BCPNN using FPGAs.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2021
Keywords
AI, BCPNN, Emerging Machine Learning, FPGA, GPU, HPC, Neural networks, Representation learning, Unsupervised learning, Backpropagation, Deep learning, Digital arithmetic, Field programmable gate arrays (FPGA), Learning systems, Problem oriented languages, Program processors, Application area, Benchmark datasets, Domain specific languages, Floating points, High performance computing systems, Learning methods, Multiple domains, Theoretical foundations
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-310412 (URN)10.1145/3468044.3468052 (DOI)2-s2.0-85109368800 (Scopus ID)
Conference
11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, HEART 2021, 21 June 2021- 23 June 2021, Online, Germany.
Note

Part of proceedings ISBN: 978-1-4503-8549-7

QC 20220331

Available from: 2022-03-31 Created: 2022-03-31 Last updated: 2023-01-18Bibliographically approved
Brown, N., Nash, R., Poletti, P., Guzzetta, G., Manica, M., Zardini, A., . . . Gerndt, A. (2021). Utilising urgent computing to tackle the spread of mosquito-borne diseases. In: Proceedings of Urgenthpc 2021: The Third International Workshop On Hpc For Urgent Decision Making. Paper presented at 3rd International Workshop on HPC for Urgent Decision Making (UrgentHPC), NOV 19, 2021, St Louis, MO (pp. 36-44). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Utilising urgent computing to tackle the spread of mosquito-borne diseases
Show others...
2021 (English)In: Proceedings of Urgenthpc 2021: The Third International Workshop On Hpc For Urgent Decision Making, Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 36-44Conference paper, Published paper (Refereed)
Abstract [en]

It is estimated that around 80% of the world's population live in areas susceptible to at-least one major vector borne disease, and approximately 20% of global communicable diseases are spread by mosquitoes. Furthermore, the outbreaks of such diseases are becoming more common and widespread, with much of this driven in recent years by socio-demographic and climatic factors. These trends are causing significant worry to global health organisations, including the CDC and WHO, and-so an important question is the role that technology can play in addressing them. In this work we describe the integration of an epidemiology model, which simulates the spread of mosquito-borne diseases, with the VESTEC urgent computing ecosystem. The intention of this work is to empower human health professionals to exploit this model and more easily explore the progression of mosquito-borne diseases. Traditionally in the domain of the few research scientists, by leveraging state of the art visualisation and analytics techniques, all supported by running the computational workloads on HPC machines in a seamless fashion, we demonstrate the significant advantages that such an integration can provide. Furthermore we demonstrate the benefits of using an ecosystem such as VESTEC, which provides a framework for urgent computing, in supporting the easy adoption of these technologies by the epidemiologists and disaster response professionals more widely.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
Mosquito-borne diseases, urgent computing, HPC, disease simulation, epidemiology
National Category
Public Health, Global Health and Social Medicine
Identifiers
urn:nbn:se:kth:diva-310051 (URN)10.1109/UrgentHPC54802.2021.00010 (DOI)000758406900005 ()2-s2.0-85124466277 (Scopus ID)
Conference
3rd International Workshop on HPC for Urgent Decision Making (UrgentHPC), NOV 19, 2021, St Louis, MO
Note

Part of proceedings: ISBN 978-1-6654-1130-1

QC 20220323

Available from: 2022-03-23 Created: 2022-03-23 Last updated: 2025-02-20Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-6408-3333

Search in DiVA

Show all publications