kth.sePublikationer
Ändra sökning
Avgränsa sökresultatet
12 1 - 50 av 100
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Abraham, Mark James
    et al.
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Apostolov, Rossen
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Barnoud, Jonathan
    Univ Groningen, NL-9712 CP Groningen, Netherlands.;Univ Bristol, Intangible Real Lab, Bristol, Avon, England..
    Bauer, Paul
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Blau, Christian
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Bonvin, Alexandre M. J. J.
    Univ Utrecht, Bijvoet Ctr, Fac Sci, Utrecht, Netherlands..
    Chavent, Matthieu
    Univ Paul Sabatier, IPBS, F-31062 Toulouse, France..
    Chodera, John
    Mem Sloan Kettering Canc Ctr, Sloan Kettering Inst, Computat & Syst Biol Program, New York, NY 10065 USA..
    Condic-Jurkic, Karmen
    Mem Sloan Kettering Canc Ctr, Sloan Kettering Inst, Computat & Syst Biol Program, New York, NY 10065 USA.;Open Force Field Consortium, La Jolla, CA USA..
    Delemotte, Lucie
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Grubmueller, Helmut
    Max Planck Inst Biophys Chem, D-37077 Gottingen, Germany..
    Howard, Rebecca
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Jordan, E. Joseph
    Stockholm Univ, Dept Biochem & Biophys, Sci Life Lab, Box 1031, SE-17121 Solna, Sweden..
    Lindahl, Erik
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Ollila, O. H. Samuli
    Univ Helsinki, Inst Biotechnol, SF-00100 Helsinki, Finland..
    Selent, Jana
    Pompeu Fabra Univ, Hosp del Mar Med Res Inst IMIM, Res Programme Biomed Informat, Barcelona 08002, Spain.;Pompeu Fabra Univ, Dept Expt & Hlth Sci, Barcelona 08002, Spain..
    Smith, Daniel G. A.
    Mol Sci Software Inst, Blacksburg, VA 24060 USA..
    Stansfeld, Phillip J.
    Univ Oxford, Dept Biochem, Oxford OX1 2JD, England.;Univ Warwick, Sch Life Sci, Coventry CV4 7AL, W Midlands, England.;Univ Warwick, Dept Chem, Coventry CV4 7AL, W Midlands, England..
    Tiemann, Johanna K. S.
    Univ Leipzig, Fac Med, Inst Med Phys & Biophys, D-04107 Leipzig, Germany..
    Trellet, Mikael
    Univ Utrecht, Bijvoet Ctr, Fac Sci, Utrecht, Netherlands..
    Woods, Christopher
    Univ Bristol, Bristol BS8 1TH, Avon, England..
    Zhmurov, Artem
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Sharing Data from Molecular Simulations2019Ingår i: Journal of Chemical Information and Modeling, ISSN 1549-9596, E-ISSN 1549-960X, Vol. 59, nr 10, s. 4093-4099Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Given the need for modern researchers to produce open, reproducible scientific output, the lack of standards and best practices for sharing data and workflows used to produce and analyze molecular dynamics (MD) simulations has become an important issue in the field. There are now multiple well-established packages to perform molecular dynamics simulations, often highly tuned for exploiting specific classes of hardware, each with strong communities surrounding them, but with very limited interoperability/transferability options. Thus, the choice of the software package often dictates the workflow for both simulation production and analysis. The level of detail in documenting the workflows and analysis code varies greatly in published work, hindering reproducibility of the reported results and the ability for other researchers to build on these studies. An increasing number of researchers are motivated to make their data available, but many challenges remain in order to effectively share and reuse simulation data. To discuss these and other issues related to best practices in the field in general, we organized a workshop in November 2018 (https://bioexcel.eu/events/workshop-on-sharing-data-from-molecular-simulations/). Here, we present a brief overview of this workshop and topics discussed. We hope this effort will spark further conversation in the MD community to pave the way toward more open, interoperable, and reproducible outputs coming from research studies using MD simulations.

  • 2.
    Aguilar, Xavier
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Performance Monitoring, Analysis, and Real-Time Introspection on Large-Scale Parallel Systems2020Doktorsavhandling, monografi (Övrigt vetenskapligt)
    Abstract [sv]

    HPC (högpresterande datorer) har idag blivit ett nödvändigt verktyg för stora forskningsprojekt inom olika områden såsom läkemedelsdesign, klimat- modellering mm. Den enorma datorkraften hos HPC-system har dessutom gjort det möjligt för forskare att simulera problem som var otänkbara för en- dast några år sedan. Det finns dock ett problem. Den ökande komplexiteten hos HPC-system medför att utvecklingen av effektiv mjukvara kapabel att ut- nyttja dessa resurser blir utmanande. Användningen av prestandaövervakning och mjukvaruanalys måste därför spela en viktig roll för att avslöja prestand- aproblem i parallella system. Utveckling av prestandaverktyg står dock också inför liknande utmaningar och måste kunna hantera allt växande mängder genererade data.

    I denna avhandling föreslår vi en ny modell för prestandakaraktärisering av MPI applikationer för att försöka lösa problemet med stora datamängder. Vår metod använder sig av “Event Flow” grafer för att balansera mellan skal- barheten av profileringsmetoder, dvs prestandadata av aggregerade mätvär- den, med informationen från spårningsmetoder, dvs filer med tidsstämplade händelser. Dessa grafer tillåter oss att koda händelserna och därmed minskar behovet av lagring, vilket leder till utnyttjande av mycket mindre minne och diskutrymme, och slutligen till ökad skalbarhet. Vi demonstrerar även i denna avhandling hur vår “Event Flow” grafmodell kan användas för spårkompri- mering. Dessutom föreslår vi en ny metod som använder “Event Flow” grafer för att automatiskt undersöka strukturen hos MPI-applikationer. Denna kun- skap kan i efterhand användas för att samla in prestandadata på ett smartare sätt och minskar mängden redundanta data som samlas in. Slutligen visar vi att våra grafer kan användas inom andra områden, utöver spårkomprime- ring och automatiskt analys av prestandadata, dvs för att utforska visuella prestandadata.

    Förutom ”Event Flow” grafer undersöker vi i denna avhandling även de- signen och användningen av ramverk för introspektion av prestanda. Framtida HPC-system kommer att vara mycket dynamiska miljöer kapabla till extrema nivåer av parallelism, men med en begränsad energikonsumtion, betydande resursfördelning och heterogen hårdvara. Användningen av realtidsdata för att orkestrera exekvering av program i så komplexa och dynamiska miljöer kommer att bli en nödvändighet. Den här avhandlingen presenterar två oli- ka ramverk för introspektion av prestandadata. Dessa ramverk är enkla att använda, ger prestandadata i realtid och kräver få resurser. Vi demonstrerar bland annat hur vårt tillvägagångssätt kan användas för att i realtid minska systemets energikonsumtion.

    De metoder som föreslås i denna avhandling har bekräftats på olika stor- skaliga HPC-system med många kärnor såväl som gentemot nutida vetenskap- liga applikationer. Experimenten visar att våra metoder, när det gäller pre- standakarakterisering och introspektion av prestandadata, inte är resurskrä- vande och kan bidra till prestandaövervakning av framtida HPC-system.

    Ladda ner fulltext (pdf)
    fulltext
  • 3.
    Aguilar, Xavier
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    A Deep Learning-Based Particle-in-Cell Method for Plasma Simulations2021Ingår i: 2021 IEEE International Conference On Cluster Computing (CLUSTER 2021), Institute of Electrical and Electronics Engineers (IEEE) , 2021, s. 692-697Konferensbidrag (Refereegranskat)
    Abstract [en]

    We design and develop a new Particle-in-Cell (PIC) method for plasma simulations using Deep-Learning (DL) to calculate the electric field from the electron phase space. We train a Multilayer Perceptron (MLP) and a Convolutional Neural Network (CNN) to solve the two-stream instability test. We verify that the DL-based MLP PIC method produces the correct results using the two-stream instability: the DL-based PIC provides the expected growth rate of the two-stream instability. The DL-based PIC does not conserve the total energy and momentum. However, the DL-based PIC method is stable against the cold-beam instability, affecting traditional PIC methods. This work shows that integrating DL technologies into traditional computational methods is a viable approach for developing next-generation PIC algorithms.

  • 4.
    Ahlin, Daniel
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Bauermeister, Boris
    Stockholm Univ, Dept Phys, Oskar Klein Ctr, Stockholm, Sweden..
    Conrad, Jan
    Stockholm Univ, Dept Phys, Oskar Klein Ctr, Stockholm, Sweden..
    Gardner, Robert
    Univ Chicago, Enrico Fermi Inst, 5640 S Ellis Ave, Chicago, IL 60637 USA..
    Grandi, Luca
    Univ Chicago, Dept Phys, Chicago, IL 60637 USA.;Univ Chicago, Kavli Inst Cosmol Phys, Chicago, IL 60637 USA..
    Riedel, Benedikt
    Univ Chicago, Enrico Fermi Inst, 5640 S Ellis Ave, Chicago, IL 60637 USA..
    Shockley, Evan
    Univ Chicago, Dept Phys, Chicago, IL 60637 USA.;Univ Chicago, Kavli Inst Cosmol Phys, Chicago, IL 60637 USA..
    Stephen, Judith
    Univ Chicago, Enrico Fermi Inst, 5640 S Ellis Ave, Chicago, IL 60637 USA..
    Sundblad, Ragnar
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Thapa, Suchandra
    Univ Chicago, Enrico Fermi Inst, 5640 S Ellis Ave, Chicago, IL 60637 USA..
    Tunnell, Christopher
    Univ Chicago, Dept Phys, Chicago, IL 60637 USA.;Univ Chicago, Kavli Inst Cosmol Phys, Chicago, IL 60637 USA..
    The XENON1T Data Distribution and Processing Scheme2019Ingår i: 23rd International Conference on Computing in High Energy and Nuclear Physics (CHEP) / [ed] Forti, A Betev, L Litmaath, M Smirnova, O Hristov, P, EDP Sciences , 2019, Vol. 214, s. 03015-, artikel-id 03015Konferensbidrag (Refereegranskat)
    Abstract [en]

    The XENON experiment is looking for non-baryonic particle dark matter in the universe. The setup is a dual phase time projection chamber (TPC) filled with 3200 kg of ultra-pure liquid xenon. The setup is operated at the Laboratori Nazionali del Gran Sasso (LNGS) in Italy. We present a full overview of the computing scheme for data distribution and job management in XENON1T. The software package Rucio, which is developed by the ATLAS collaboration, facilitates data handling on Open Science Grid (OSG) and European Grid Infrastructure (EGI) storage systems. A tape copy at the Centre for High Performance Computing (PDC) is managed by the Tivoli Storage Manager (TSM). Data reduction and Monte Carlo production are handled by CI Connect which is integrated into the OSG network. The job submission system connects resources at the EGI, OSG, SDSC's Comet, and the campus HPC resources for distributed computing. The previous success in the XENON1T computing scheme is also the starting point for its successor experiment XENONnT, which starts to take data in autumn 2019.

  • 5.
    Al Ahad, Muhammed Abdullah
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Simmendinger, Christian
    T Syst Solut Res GmbH, D-70563 Stuttgart, Germany..
    Iakymchuk, Roman
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Efficient Algorithms for Collective Operations with Notified Communication in Shared Windows2018Ingår i: PROCEEDINGS OF PAW-ATM18: 2018 IEEE/ACM PARALLEL APPLICATIONS WORKSHOP, ALTERNATIVES TO MPI (PAW-ATM), IEEE , 2018, s. 1-10Konferensbidrag (Refereegranskat)
    Abstract [en]

    Collective operations are commonly used in various parts of scientific applications. Especially in strong scaling scenarios collective operations can negatively impact the overall applications performance: while the load per rank here decreases with increasing core counts, time spent in e.g. barrier operations will increase logarithmically with the core count. In this article, we develop novel algorithmic solutions for collective operations such as Allreduce and Allgather(V)-by leveraging notified communication in shared windows. To this end, we have developed an extension of GASPI which enables all ranks participating in a shared window to observe the entire notified communication targeted at the window. By exploring benefits of this extension, we deliver high performing implementations of Allreduce and Allgather(V) on Intel and Cray clusters. These implementations clearly achieve 2x-4x performance improvements compared to the best performing MPI implementations for various data distributions.

  • 6.
    Alam, Sadaf R.
    et al.
    Swiss Fed Inst Technol, Swiss Natl Supercomp Ctr CSCS, Zurich, Switzerland..
    Bartolome, Javier
    Barcelona Supercomp Ctr BSC, Barcelona, Spain..
    Carpene, Michele
    Italian Supercomp Ctr CINECA, Casalecchio Di Reno, Italy..
    Happonen, Kalle
    Finnish Supercomp Ctr CSC, Espoo, Finland..
    Lafoucriere, Jacques-Charles
    Commissariat Energie Atom & Energies Alternat CEA, Paris, France..
    Pleiter, Dirk
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap. Juelich Supercomp Ctr JSC, Julich, Germany..
    Fenix: A Pan-European Federation of Supercomputing and Cloud e-Infrastructure Services2022Ingår i: Communications of the ACM, ISSN 0001-0782, E-ISSN 1557-7317, Vol. 65, nr 4, s. 46-47Artikel i tidskrift (Övrigt vetenskapligt)
  • 7.
    Alekseenko, Andrej
    et al.
    KTH, Skolan för teknikvetenskap (SCI), Tillämpad fysik, Biofysik.
    Pall, Szilard
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Lindahl, Erik
    KTH, Skolan för teknikvetenskap (SCI), Tillämpad fysik, Biofysik.
    Experiences with Adding SYCL Support to GROMACS2021Ingår i: IWOCL'21: Proceedings International Workshop on OpenCL IWOCL 2021, Association for Computing Machinery (ACM) , 2021Konferensbidrag (Refereegranskat)
    Abstract [en]

    GROMACS is an open-source, high-performance molecular dynamics (MD) package primarily used for biomolecular simulations, accounting for 5% of HPC utilization worldwide. Due to the extreme computing needs of MD, significant efforts are invested in improving the performance and scalability of simulations. Target hardware ranges from supercomputers to laptops of individual researchers and volunteers of distributed computing projects such as Folding@Home. The code has been designed both for portability and performance by explicitly adapting algorithms to SIMD and data-parallel processors. A SIMD intrinsic abstraction layer provides high CPU performance. Explicit GPU acceleration has long used CUDA to target NVIDIA devices and OpenCL for AMD/Intel devices. In this talk, we discuss the experiences and challenges of adding support for the SYCL platform into the established GROMACS codebase and share experiences and considerations in porting and optimization. While OpenCL offers the benefits of using the same code to target different hardware, it suffers from several drawbacks that add significant development friction. Its separate-source model leads to code duplication and makes changes complicated. The need to use C99 for kernels, while the rest of the codebase uses C++17, exacerbates these issues. Another problem is that OpenCL, while supported by most GPU vendors, is never the main framework and thus is not getting the primary support or tuning efforts. SYCL alleviates many of these issues, employing a single-source model based on the modern C++ standard. In addition to being the primary platform for Intel GPUs, the possibility to target AMD and NVIDIA GPUs through other implementations (e.g., hipSYCL) might make it possible to reduce the number of separate GPU ports that have to be maintained. Some design differences from OpenCL, such as flow directed acyclic graphs (DAGs) instead of in-order queues, made it necessary to reconsider the GROMACS's task scheduling approach and architectural choices in the GPU backend. Additionally, supporting multiple GPU platforms presents a challenge of balancing performance (low-level and hardware-specific code) and maintainability (more generalization and code-reuse). We will discuss the limitations of the existing codebase and interoperability layers with regards to adding the new platform; the compute performance and latency comparisons; code quality considerations; and the issues we encountered with SYCL implementations tested. Finally, we will discuss our goals for the next release cycle for the SYCL backend and the overall architecture of GPU acceleration code in GROMACS.

  • 8.
    Asquith, Nathan L.
    et al.
    Univ Leeds, Leeds Inst Cardiovasc & Metab Med, Sch Med, Discovery & Translat Sci Dept, Leeds, England.;Boston Childrens Hosp, Harvard Med Sch, Vasc Biol Program, Karp Res Labs, Boston, MA USA..
    Duval, Cedric
    Univ Leeds, Leeds Inst Cardiovasc & Metab Med, Sch Med, Discovery & Translat Sci Dept, Leeds, England..
    Zhmurov, Artem
    KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. EuroCC Natl Competence Ctr Sweden, Stockholm, Sweden.
    Baker, Stephen R.
    Univ Leeds, Leeds Inst Cardiovasc & Metab Med, Sch Med, Discovery & Translat Sci Dept, Leeds, England..
    McPherson, Helen R.
    Univ Leeds, Leeds Inst Cardiovasc & Metab Med, Sch Med, Discovery & Translat Sci Dept, Leeds, England..
    Domingues, Marco M.
    Univ Leeds, Leeds Inst Cardiovasc & Metab Med, Sch Med, Discovery & Translat Sci Dept, Leeds, England.;Univ Lisbon, Inst Mol Med, Fac Med, Lisbon, Portugal..
    Connell, Simon D. A.
    Univ Leeds, Sch Phys & Astron, Mol & Nanoscale Phys Grp, Leeds, England..
    Barsegov, Valeri
    Univ Massachusetts, Dept Chem, Lowell, MA USA..
    Ariens, Robert A. S.
    Univ Leeds, Leeds Inst Cardiovasc & Metab Med, Sch Med, Discovery & Translat Sci Dept, Leeds, England.;Univ Leeds, Leeds Inst Cardiovasc & Metab Med, Discovery & Translat Sci Dept, Leeds LS2 9JT, England..
    Fibrin protofibril packing and clot stability are enhanced by extended knob-hole interactions and catch-slip bonds2022Ingår i: Blood Advances, ISSN 2473-9529, Vol. 6, nr 13, s. 4015-4027Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Fibrin polymerization involves thrombin-mediated exposure of knobs on one monomer that bind to holes available on another, leading to the formation of fibers. In silico evidence has suggested that the classical A:a knob-hole interaction is enhanced by surrounding residues not directly involved in the binding pocket of hole a, via noncovalent interactions with knob A. We assessed the importance of extended knob-hole interactions by performing biochemical, biophysical, and in silico modeling studies on recombinant human fibrinogen variants with mutations at residues responsible for the extended interactions. Three single fibrinogen variants, yD297N, yE323Q, and yK356Q, and a triple variant yDEK (yD297N/yE323Q/yK356Q) were produced in a CHO (Chinese Hamster Ovary) cell expression system. Longitudinal protofibril growth probed by atomic force microscopy was disrupted for yD297N and enhanced for the yK356Q mutation. Initial polymerization rates were reduced for all variants in turbidimetric studies. Laser scanning confocal microscopy showed that yDEK and yE323Q produced denser clots, whereas yD297N and yK356Q were similar to wild type. Scanning electron microscopy and light scattering studies showed that fiber thickness and protofibril packing of the fibers were reduced for all variants. Clot viscoelastic analysis showed that only yDEK was more readily deformable. In silico modeling suggested that most variants displayed only slip-bond dissociation kinetics compared with biphasic catch-slip kinetics characteristics of wild type. These data provide new evidence for the role of extended interactions in supporting the classical knob-hole bonds involving catch-slip behavior in fibrin formation, clot structure, and clot mechanics.

  • 9.
    Atzori, Marco
    et al.
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för teknikvetenskap (SCI), Centra, Linné Flow Center, FLOW.
    Köpp, Wiebke
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Chien, Wei Der
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Teoretisk datalogi, TCS. KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Massaro, Daniele
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik.
    Mallor, Fermin
    KTH, Skolan för teknikvetenskap (SCI), Centra, Linné Flow Center, FLOW.
    Peplinski, Adam
    KTH, Skolan för teknikvetenskap (SCI), Centra, Linné Flow Center, FLOW. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Rezaei, Mohammadtaghi
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Vinuesa, Ricardo
    KTH, Skolan för teknikvetenskap (SCI), Centra, Linné Flow Center, FLOW.
    Laure, E.
    Schlatter, Philipp
    KTH, Skolan för teknikvetenskap (SCI), Centra, Linné Flow Center, FLOW.
    Weinkauf, Tino
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    In situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst2022Ingår i: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 78, nr 3, s. 3605-3620Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In situ visualization on high-performance computing systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We develop an in situ adaptor for Paraview Catalyst and Nek5000, a massively parallel Fortran and C code for computational fluid dynamics. We perform a strong scalability test up to 2048 cores on KTH’s Beskow Cray XC40 supercomputer and assess in situ visualization’s impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ≈ 21 % on 2048 cores (the relative efficiency of Nek5000 without in situ operations is ≈ 99 %). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses the Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in situ processing time between rank 0 and all other ranks. In our case, better scaling and load-balancing in the parallel image composition would considerably improve the performance of Nek5000 with in situ capabilities. In general, the result of this study highlights the technical challenges posed by the integration of high-performance simulation codes and data-analysis libraries and their practical use in complex cases, even when efficient algorithms already exist for a certain application scenario.

  • 10. Batelaan, M.
    et al.
    Horsley, R.
    Nakamura, Y.
    Perlt, H.
    Pleiter, Dirk
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Rakow, P. E. L.
    Schierholz, G.
    Stüben, H.
    Young, R. D.
    Zanotti, J. M.
    Collaboration, Q C D S F-U K Q C D-C S S M
    Nucleon Form Factors from the Feynman-Hellmann Method in Lattice QCD2022Ingår i: Proceedings of Science, Sissa Medialab Srl , 2022Konferensbidrag (Refereegranskat)
    Abstract [en]

    Lattice QCD calculations of the nucleon electromagnetic form factors are of interest at both the high and low momentum transfer regions. For high momentum transfers especially there are open questions which require more intense study, such as the potential zero crossing in the proton's electric form factor. We will present recent progress from the QCDSF/UKQCD/CSSM collaboration on the calculation of these form factors using the Feynman-Hellmann method in lattice QCD. The Feynman-Hellmann method allows for greater control over excited states which we take advantage of by going to high values of the momentum transfer. In this proceeding we present results of the form factors up to 6 GeV2, using Nf = 2 + 1 flavour fermions for three different pion masses in the range 310-470 MeV. The results are extrapolated to the physical pion mass through the use of a flavour breaking expansion. 

  • 11. Bickerton, J. M.
    et al.
    Cooke, A. N.
    Horsley, R.
    Nakamura, Y.
    Perlt, H.
    Pleiter, Dirk
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Rakow, P. E. L.
    Schierholz, G.
    Stüben, H.
    Young, R. D.
    Zanotti, J. M.
    Patterns of flavour symmetry breaking in hadron matrix elements involving u, d and s quarks2022Ingår i: Proceedings of Science, Sissa Medialab Srl , 2022Konferensbidrag (Refereegranskat)
    Abstract [en]

    Using an SU(3)-flavour symmetry breaking expansion between the strange and light quark masses, we determine how this constrains the extrapolation of baryon octet matrix elements and form factors. In particular we can construct certain combinations, which fan out from the symmetric point (when all the quark masses are degenerate) to the point where the light and strange quarks take their physical values. As a further example we consider the vector amplitude at zero momentum transfer for flavour changing currents.

  • 12.
    Borisov, Vladislav
    et al.
    Uppsala Univ, Dept Phys & Astron, Box 516, SE-75120 Uppsala, Sweden..
    Xu, Qichen
    KTH, Skolan för bioteknologi (BIO), Centra, Albanova VinnExcellence Center for Protein Technology, ProNova. KTH, Skolan för teknikvetenskap (SCI), Tillämpad fysik. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Ntallis, Nikolaos
    Uppsala Univ, Dept Phys & Astron, Box 516, SE-75120 Uppsala, Sweden..
    Clulow, Rebecca
    Uppsala Univ, Dept Chem, Box 538, SE-75121 Uppsala, Sweden..
    Shtender, Vitalii
    Uppsala Univ, Dept Chem, Box 538, SE-75121 Uppsala, Sweden..
    Cedervall, Johan
    Stockholm Univ, Dept Mat & Environm Chem, SE-10691 Stockholm, Sweden..
    Sahlberg, Martin
    Uppsala Univ, Dept Chem, Box 538, SE-75121 Uppsala, Sweden..
    Wikfeldt, Kjartan Thor
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Thonig, Danny
    Uppsala Univ, Dept Phys & Astron, Box 516, SE-75120 Uppsala, Sweden.;Örebro Univ, Sch Sci & Technol, SE-70182 Örebro, Sweden..
    Pereiro, Manuel
    Uppsala Univ, Dept Phys & Astron, Box 516, SE-75120 Uppsala, Sweden..
    Bergman, Anders
    Uppsala Univ, Dept Phys & Astron, Box 516, SE-75120 Uppsala, Sweden..
    Delin, Anna
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för bioteknologi (BIO), Centra, Albanova VinnExcellence Center for Protein Technology, ProNova. KTH, Skolan för teknikvetenskap (SCI), Tillämpad fysik.
    Eriksson, Olle
    Uppsala Univ, Dept Phys & Astron, Box 516, SE-75120 Uppsala, Sweden.;Örebro Univ, Sch Sci & Technol, SE-70182 Örebro, Sweden..
    Tuning skyrmions in B20 compounds by 4d and 5d doping2022Ingår i: Physical Review Materials, E-ISSN 2475-9953, Vol. 6, nr 8, artikel-id 084401Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Skyrmion stabilization in novel magnetic systems with the B20 crystal structure is reported here, primarily based on theoretical results. The focus is on the effect of alloying on the 3d sublattice of the B20 structure by substitution of heavier 4d and 5d elements, with the ambition to tune the spin-orbit coupling and its influence on magnetic interactions. State-of-the-art methods based on density functional theory are used to calculate both isotropic and anisotropic exchange interactions. Significant enhancement of the Dzyaloshinskii-Moriya interaction is reported for 5d-doped FeSi and CoSi, accompanied by a large modification of the spin stiffness and spiralization. Micromagnetic simulations coupled to atomistic spin-dynamics and ab initio magnetic interactions reveal the spin-spiral nature of the magnetic ground state and field-induced skyrmions for all these systems. Especially small skyrmions similar to 50 nm are predicted for Co0.75Os0.25Si, compared to similar to 148 nm for Fe0.75Co0.25Si. Convex-hull analysis suggests that all B20 compounds considered here are structurally stable at elevated temperatures and should be possible to synthesize. This prediction is confirmed experimentally by synthesis and structural analysis of the Ru-doped CoSi systems discussed here, both in powder and in single-crystal forms.

  • 13.
    Brand, Manuel
    et al.
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Teoretisk kemi och biologi. KTH Royal Inst Technol, Dept Theoret Chem & Biol, Sch Engn Sci Chem Biotechnol & Hlth, SE-10691 Stockholm, Sweden..
    Ahmadzadeh, Karan
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Teoretisk kemi och biologi. KTH Royal Inst Technol, Dept Theoret Chem & Biol, Sch Engn Sci Chem Biotechnol & Hlth, SE-10691 Stockholm, Sweden..
    Li, Xin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Teoretisk kemi och biologi. KTH Royal Inst Technol, Dept Theoret Chem & Biol, Sch Engn Sci Chem Biotechnol & Hlth, SE-10691 Stockholm, Sweden..
    Rinkevicius, Zilvinas
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Teoretisk kemi och biologi. Department of Physics, Faculty of Mathematics and Natural Sciences, Kaunas University of Technology, LT-51368 Kaunas, Lithuania.
    Saidi, Wissam A.
    Univ Pittsburgh, Dept Mech Engn & Mat Sci, Pittsburgh, PA 15261 USA..
    Norman, Patrick
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Teoretisk kemi och biologi. KTH Royal Inst Technol, Dept Theoret Chem & Biol, Sch Engn Sci Chem Biotechnol & Hlth, SE-10691 Stockholm, Sweden..
    Size-dependent polarizabilities and van der Waals dispersion coefficients of fullerenes from large-scale complex polarization propagator calculations2021Ingår i: Journal of Chemical Physics, ISSN 0021-9606, E-ISSN 1089-7690, Vol. 154, nr 7, artikel-id 074304Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    While the anomalous non-additive size-dependencies of static dipole polarizabilities and van der Waals C-6 dispersion coefficients of carbon fullerenes are well established, the widespread reported scalings for the latter (ranging from N-2.2 to N-2.8) call for a comprehensive first-principles investigation. With a highly efficient implementation of the linear complex polarization propagator, we have performed Hartree-Fock and Kohn-Sham density functional theory calculations of the frequency-dependent polarizabilities for fullerenes consisting of up to 540 carbon atoms. Our results for the static polarizabilities and C-6 coefficients show scalings of N-1.2 and N-2.2, respectively, thereby deviating significantly from the previously reported values obtained with the use of semi-classical/empirical methods. Arguably, our reported values are the most accurate to date as they represent the first ab initio or first-principles treatment of fullerenes up to a convincing system size.

  • 14.
    Brand, Manuel
    et al.
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Kemi, Teoretisk kemi och biologi.
    Dreuw, Andreas
    Ruprecht Karls Univ Heidelberg, Interdisciplinary Ctr Sci Comp, D-69120 Heidelberg, Germany..
    Norman, Patrick
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Kemi, Teoretisk kemi och biologi.
    Li, Xin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Efficient and Parallel Implementation of Real and Complex Response Functions Employing the Second-Order Algebraic-Diagrammatic Construction Scheme for the Polarization Propagator2023Ingår i: Journal of Chemical Theory and Computation, ISSN 1549-9618, E-ISSN 1549-9626, Vol. 20, nr 1, s. 103-113Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present the implementation of an efficient matrix-folded formalism for the evaluation of complex response functions and the calculation of transition properties at the level of the second-order algebraic-diagrammatic construction (ADC(2)) scheme. The underlying algorithms, in combination with the adopted hybrid MPI/OpenMP parallelization strategy, enabled calculations of the UV/vis spectra of a guanine oligomer series ranging up to 1032 contracted basis functions, thereby utilizing vast computational resources from up to 32,768 CPU cores. Further analysis of the convergence behavior of the involved iterative subspace algorithms revealed the superiority of a frequency-separated treatment of response equations even for a large spectral window, including 101 frequencies. We demonstrate the applicability to general quantum mechanical operators by the first reported electronic circular dichroism spectrum calculated with a complex polarization propagator approach at the ADC(2) level of theory.

  • 15.
    Brank, Bine
    et al.
    Forschungszentrum Julich, Julich Supercomp Ctr, Julich, Germany..
    Pleiter, Dirk
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Assessing the State of Autovectorization Support based on SVE2022Ingår i: 2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), Institute of Electrical and Electronics Engineers (IEEE) , 2022, s. 556-562Konferensbidrag (Refereegranskat)
    Abstract [en]

    So-called SIMD instructions, which trigger operations that process in each clock cycle a data tuple, have become widespread in modern processor architectures. In particular, processors for high-performance computing (HPC) systems rely on this additional level of parallelism to reach a high throughput of arithmetic operations. Leveraging these SIMD instructions can still be challenging for application software developers. This challenge has become simpler due to a compiler technique called auto-vectorization. In this paper, we explore the current state of auto-vectorization capabilities using state-of-the-art compilers using a recent extension of the Arm instruction set architecture, called SVE. We measure the performance gains on a recent processor architecture supporting SVE, namely the Fujitsu A64FX processor.

  • 16.
    Brocke, Ekaterina
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Djurfeldt, Mikael
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Efficient Spike Communication in the MUSIC Framework on a Blue Gene/Q SupercomputerManuskript (preprint) (Övrig (populärvetenskap, debatt, mm))
    Ladda ner fulltext (pdf)
    fulltext
  • 17. Camisasca, G.
    et al.
    Pathak, H.
    Wikfeldt, Kjartan Thor
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. Department of Physics, AlbaNova University Center, Stockholm University, Stockholm, SE-10609, Sweden.
    Pettersson, L. Gunnar M.
    Radial distribution functions of water: Models vs experiments2019Ingår i: Journal of Chemical Physics, ISSN 0021-9606, E-ISSN 1089-7690, Vol. 151, nr 4, artikel-id 044502Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We study the temperature behavior of the first four peaks of the oxygen-oxygen radial distribution function of water, simulated by the TIP4P/2005, MB-pol, TIP5P, and SPC/E models and compare to experimental X-ray diffraction data, including a new measurement which extends down to 235 K [H. Pathak et al., J. Chem. Phys. 150, 224506 (2019)]. We find the overall best agreement using the MB-pol and TIP4P/2005 models. We observe, upon cooling, a minimum in the position of the second shell simulated with TIP4P/2005 and SPC/E potentials, located close to the temperature of maximum density. We also calculated the two-body entropy and the contributions coming from the first, second, and outer shells to this quantity. We show that, even if the main contribution comes from the first shell, the contribution of the second shell can become important at low temperature. While real water appears to be less ordered at short distance than obtained by any of the potentials, the different water potentials show more or less order compared to the experiments depending on the considered length-scale.

  • 18.
    Camisasca, Gaia
    et al.
    Stockholm Univ, Dept Phys, S-10691 Stockholm, Sweden..
    Galamba, Nuno
    Univ Lisbon, Fac Sci, Ctr Chem & Biochem, C8 Campo Grande, P-1749016 Lisbon, Portugal.;Univ Lisbon, Fac Sci, Biosyst & Integrat Sci Inst, C8 Campo Grande, P-1749016 Lisbon, Portugal..
    Wikfeldt, Kjartan Thor
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. Stockholm Univ, Dept Phys, S-10691 Stockholm, Sweden..
    Pettersson, Lars G. M.
    Stockholm Univ, Dept Phys, S-10691 Stockholm, Sweden..
    Translational and rotational dynamics of high and low density TIP4P/2005 water2019Ingår i: Journal of Chemical Physics, ISSN 0021-9606, E-ISSN 1089-7690, Vol. 150, nr 22, artikel-id 224507Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We use molecular dynamics simulations using TIP4P/2005 to investigate the self- and distinct-van Hove functions for different local environments of water, classified using the local structure index as an order parameter. The orientational dynamics were studied through the calculation of the time-correlation functions of different-order Legendre polynomials in the OH-bond unit vector. We found that the translational and orientational dynamics are slower for molecules in a low-density local environment and correspondingly the mobility is enhanced upon increasing the local density, consistent with some previous works, but opposite to a recent study on the van Hove function. From the analysis of the distinct dynamics, we find that the second and fourth peaks of the radial distribution function, previously identified as low density-like arrangements, show long persistence in time. The analysis of the time-dependent interparticle distance between the central molecule and the first coordination shell shows that particle identity persists longer than distinct van Hove correlations. The motion of two first-nearest-neighbor molecules thus remains coupled even when this correlation function has been completely decayed. With respect to the orientational dynamics, we show that correlation functions of molecules in a low-density environment decay exponentially, while molecules in a local high-density environment exhibit bi-exponential decay, indicating that dynamic heterogeneity of water is associated with the heterogeneity among high-density and between high-density and low-density species. This bi-exponential behavior is associated with the existence of interstitial waters and the collapse of the second coordination sphere in high-density arrangements, but not with H-bond strength.

  • 19. Chien, Steven W. D.
    et al.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Sishtla, Chaitanya Prasad
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Santos, Luis
    Herman, Pawel
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Nrasimhamurthy, Sai
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Characterizing Deep-Learning I/O Workloads in TensorFlow2018Ingår i: Proceedings of PDSW-DISCS 2018: 3rd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis, Institute of Electrical and Electronics Engineers (IEEE), 2018, s. 54-63Konferensbidrag (Refereegranskat)
    Abstract [en]

    The performance of Deep-Learning (DL) computing frameworks rely on the rformance of data ingestion and checkpointing. In fact, during the aining, a considerable high number of relatively small files are first aded and pre-processed on CPUs and then moved to accelerator for mputation. In addition, checkpointing and restart operations are rried out to allow DL computing frameworks to restart quickly from a eckpoint. Because of this, I/O affects the performance of DL plications. this work, we characterize the I/O performance and scaling of nsorFlow, an open-source programming framework developed by Google and ecifically designed for solving DL problems. To measure TensorFlow I/O rformance, we first design a micro-benchmark to measure TensorFlow ads, and then use a TensorFlow mini-application based on AlexNet to asure the performance cost of I/O and checkpointing in TensorFlow. To prove the checkpointing performance, we design and implement a burst ffer. find that increasing the number of threads increases TensorFlow ndwidth by a maximum of 2.3 x and 7.8 x on our benchmark environments. e use of the tensorFlow prefetcher results in a complete overlap of mputation on accelerator and input pipeline on CPU eliminating the fective cost of I/O on the overall performance. The use of a burst ffer to checkpoint to a fast small capacity storage and copy ynchronously the checkpoints to a slower large capacity storage sulted in a performance improvement of 2.6x with respect to eckpointing directly to slower storage on our benchmark environment.

  • 20.
    Chien, Steven W.D.
    et al.
    University of Edinburgh, United Kingdom.
    Sato, Kento
    RIKEN Center for Computational Science Japan.
    Podobas, Artur
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Programvaruteknik och datorsystem, SCS.
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Honda, Michio
    University of Edinburgh, United Kingdom.
    Improving Cloud Storage Network Bandwidth Utilization of Scientific Applications2023Ingår i: Proceedings of the 7th Asia-Pacific Workshop on Networking, APNET 2023, Association for Computing Machinery (ACM) , 2023, s. 172-173Konferensbidrag (Refereegranskat)
    Abstract [en]

    Cloud providers began to provide managed services to attract scientific applications, which have been traditionally executed on supercomputers. One example is AWS FSx for Lustre, a fully managed parallel file system (PFS) released in 2018. However, due to the nature of scientific applications, the frontend storage network bandwidth is left completely idle for the majority of its lifetime. Furthermore, the pricing model does not match the scalability requirement. We propose iFast, a novel host-side caching mechanism for scientific applications that improves storage bandwidth utilization and end-to-end application performance: by overlapping compute and data writeback through inexpensive local storage. iFast supports the Massage Passing Interface (MPI) library that is widely used by scientific applications and is implemented as a preloaded library. It requires no change to applications, the MPI library, or support from cloud operators. We demonstrate how iFast can accelerate the end-to-end time of a representative scientific application Neko, by 13-40%.

  • 21.
    Chien, Steven Wei Der
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Olshevsky, Vyacheslav
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Bulatov, Yaroslav
    South Pk Commons, San Francisco, CA USA..
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Vetter, Jeffrey S.
    Oak Ridge Natl Lab, Oak Ridge, TN USA..
    TensorFlow Doing HPC An Evaluation of TensorFlow Performance in HPC Applications2019Ingår i: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Institute of Electrical and Electronics Engineers (IEEE) , 2019, s. 509-518Konferensbidrag (Refereegranskat)
    Abstract [en]

    TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML) applications, in fact TensorFlow aims at supporting the development of a much broader range of application kinds that are outside the ML domain and can possibly include HPC applications. However, very few experiments have been conducted to evaluate TensorFlow performance when running HPC workloads on supercomputers. This work addresses this lack by designing four traditional HPC benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG) solver and Fast Fourier Transform (FFT). We analyze their performance on two supercomputers with accelerators and evaluate the potential of TensorFlow for developing HPC applications. Our tests show that TensorFlow can fully take advantage of high performance networks and accelerators on supercomputers. Running our Tensor-Flow STREAM benchmark, we obtain over 50% of theoretical communication bandwidth on our testing platform. We find an approximately 2x, 1.7x and 1.8x performance improvement when increasing the number of GPUs from two to four in the matrix-matrix multiply, CG and FFT applications respectively. All our performance results demonstrate that TensorFlow has high potential of emerging also as HPC programming framework for heterogeneous supercomputers.

  • 22.
    Chien, Steven Wei Der
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Sishtla, Chaitanya Prasad
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Jun, Zhang
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Peng, Ivy Bo
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems2018Ingår i: Proceedings of the 5th International Conference on Exascale Applications and Software, The University of Edinburgh , 2018, s. 34-Konferensbidrag (Refereegranskat)
    Abstract [en]

    Computational intensive applications such as pattern recognition, and natural language processing, are increasingly popular on HPC systems. Many of these applications use deep-learning, a branch of machine learning, to determine the weights of artificial neural network nodes by minimizing a loss function. Such applications depend heavily on dense matrix multiplications, also called tensorial operations. The use of Graphics Processing Unit (GPU) has considerably speeded up deep-learning computations, leading to a Renaissance of the artificial neural network. Recently, the NVIDIA Volta GPU and the Google Tensor Processing Unit (TPU) have been specially designed to support deep-learning workloads. New programming models have also emerged for convenient expression of tensorial operations and deep-learning computational paradigms. An example of such new programming frameworks is TensorFlow, an open-source deep-learning library released by Google in 2015. TensorFlow expresses algorithms as a computational graph where nodes represent operations and edges between nodes represent data flow. Multi-dimensional data such as vectors and matrices which flows between operations are called Tensors. For this reason, computation problems need to be expressed as a computational graph. In particular, TensorFlow supports distributed computation with flexible assignment of operation and data to devices such as GPU and CPU on different computing nodes. Computation on devices are based on optimized kernels such as MKL, Eigen and cuBLAS. Inter-node communication can be through TCP and RDMA. This work attempts to evaluate the usability and expressiveness of the TensorFlow programming model for traditional HPC problems. As an illustration, we prototyped a distributed block matrix multiplication for large dense matrices which cannot be co-located on a single device and a Conjugate Gradient (CG) solver. We evaluate the difficulty of expressing traditional HPC algorithms using computational graphs and study the scalability of distributed TensorFlow on accelerated systems. Our preliminary result with distributed matrix multiplication shows that distributed computation on TensorFlow is extremely scalable. This study provides an initial investigation of new emerging programming models for HPC.

    Ladda ner fulltext (pdf)
    fulltext
  • 23.
    Chien, Wei Der
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    An Evaluation of TensorFlow as a Programming Framework for HPC Applications2018Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [sv]

    Under de senaste åren har deep-learning, en så kallad typ av maskininlärning, blivit populärt på grund av dess applikationer och prestanda. Den viktigaste komponenten i de här teknikerna är matrismultiplikation. Grafikprocessorer (GPUs) är vanligt förekommande vid träningsprocesser av artificiella neuronnät. Detta på grund av deras massivt parallella beräkningskapacitet. Dessutom har specialiserade lågprecisionsacceleratorer  som  specifikt beräknar  matrismultiplikation tagits fram. Många utvecklingsramverk har framkommit för att hjälpa programmerare att hantera artificiella neuronnät. I TensorFlow uttrycks beräkningsproblem som en beräkningsgraf. En nod representerar en beräkningsoperation och en väg representerar dataflöde mellan beräkningsoperationer i en beräkningsgraf. Eftersom man måste programmera olika acceleratorer med olika systemarkitekturer har programmering av högprestandasystem blivit allt svårare. TensorFlow erbjuder en hög abstraktionsnivå och förenklar programmering av högprestandaberäkningar. Man programmerar acceleratorer genom att placera operationer inom grafen på olika acceleratorer med en API. I detta arbete granskas användbarheten hos TensorFlow som ett programmeringsramverk för applikationer med högprestandaberäkningar. Vi presenterar TensorFlow som ett programmeringsutvecklingsramverk för distribuerad beräkning. Vi implementerar två vanliga applikationer i TensorFlow: en lösare som löser linjära ekvationsystem med konjugerade gradientmetoden samt blockmatrismultiplikation och illustrerar hur de här problemen kan uttryckas i beräkningsgrafer för distribuerad beräkning. Vi experimenterar och kommenterar metoder för att demonstrera hur TensorFlow kan nyttja HPC-maskinvaror. Vi testar både skalbarhet och effektivitet samt gör mikro-benchmarking på kommunikationsprestanda. Genom detta arbete visar vi att TensorFlow är en framväxande och lovande plattform som passar väl för en viss typ av problem som kräver minimal synkronisering.

    Ladda ner fulltext (pdf)
    fulltext
  • 24.
    de Gracia Triviño, Juan Angel
    et al.
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Kemi, Teoretisk kemi och biologi. KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Ahlquist, Mårten S. G.
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Kemi, Teoretisk kemi och biologi.
    Removing the Barrier in O-O Bond Formation Via the Combination of Intramolecular Radical Coupling and the Oxide Relay Mechanism2024Ingår i: Journal of Physical Chemistry A, ISSN 1089-5639, E-ISSN 1520-5215, Vol. 128, nr 19, s. 3794-3800Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The Ru(tda) catalyst has been a major milestone in the development of molecular water oxidation catalysts due to its outstanding performance at neutral pH. The role of the noncoordinating carboxylate group is to act as a nucleophile, donating an oxygen atom to the oxo group, thereby acting as an oxide relay (OR) mechanism for O-O bond formation. A substitution of the carboxylates for phosphonate groups has been proposed, resulting in the Ru(tPaO) catalyst, which has shown even more efficient performance in experimental characterization. In this study, we explore the feasibility of the OR mechanism in the newly reported Ru(tPaO) molecular catalyst. We investigated the catalytic cycle using density functional theory and identified a variation of the OR mechanism that involves radical oxygen atoms in O-O bond formation. We have also determined that the subsequent hydroxide nucleophilic attack is the sole rate-limiting step in the catalytic cycle. All activation free energies are very low, with a free-energy barrier of 2.1 kcal/mol for O-O bond formation and 4.2 kcal/mol for OH- nucleophilic attack.

  • 25. De La Motte, S. A.
    et al.
    Hollitt, S. E.
    Horsley, R.
    Jackson, P. D.
    Nakamura, Y.
    Perlt, H.
    Pleiter, Dirk
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Rakow, P. E. L.
    Schierholz, G.
    Stüben, H.
    Young, R. D.
    Zanotti, J. M.
    Measurements of SU(3) f symmetry breaking in B meson decay constants2022Ingår i: Proceedings of Science, Sissa Medialab Srl , 2022Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present updates from QCDSF/UKQCD/CSSM on the SU(3) f breaking in B meson decay constants. The b-quarks are generated with an anisotropic clover-improved action, and are tuned to match properties of the physical B and B∗ mesons. Configurations are generated with m = 1/3(2ml + ms) kept constant to control symmetry breaking effects. Various sources of systematic uncertainty will be discussed, including those from continuum extrapolations and extrapolations to the physical point. We also present new efforts to calculate fB and fBs using weighted averages across multiple time fitting regions. The use of an automated weighted averaging technique over multiple fitting ranges allows for timely tuning of the b-quark and reduces the impact of systematic errors from fitting range biases in calculations of fB and fBs. 

  • 26.
    D’Orto, Manolo
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS).
    Sjöblom, Svante
    Chien, Lung Sheng
    Axner, Lilit
    ENCCS, Uppsala University.
    Gong, Jing
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Comparing Different Approaches for Solving Large Scale Power-flow Problems with the Newton-Raphson Method2021Ingår i: IEEE Access, E-ISSN 2169-3536, Vol. 9, s. 56604-56615Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper focuses on using the Newton-Raphson method to solve the power-flow problems. Since the most computationally demanding part of the Newton-Raphson method is to solve the linear equations at each iteration, this study investigates different approaches to solve the linear equations on both central processing unit (CPU) and graphical processing unit (GPU). Six different approaches have been developed and evaluated in this paper: two approaches of these run entirely on CPU while other two of these run entirely on GPU, and the remaining two are hybrid approaches that run on both CPU and GPU. All six direct linear solvers use either LU or QR factorization to solve the linear equations. Two different hardware platforms have been used to conduct the experiments. The performance results show that the CPU version with LU factorization gives better performance compared to the GPU version using standard library called cuSOLVER even for the larger power-flow problems. Moreover, it has been proven that the best performance is achieved using a hybrid method where the Jacobian matrix is assembled on GPU, the preprocessing with a sparse high performance linear solver called KLU is performed on the CPU in the first iteration, and the linear equation is factorized on the GPU and solved on the CPU. Maximum speed up in this study is obtained on the largest case with 25000 buses. The hybrid version shows a speedup factor of 9.6 with a NVIDIA P100 GPU while 13.1 with a NVIDIA V100 GPU in comparison with baseline CPU version on an Intel Xeon Gold 6132 CPU.

  • 27.
    Dugani, Vishwanath
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. KTH, Skolan för datavetenskap och kommunikation (CSC).
    Continuous system-wide profiling of High Performance Computing parallel applications: Profiling high performance applications2016Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [sv]

    Profilering av en ansökan identifierar delar av koden exekveras med hjälp av hårdvara prestandaräknare därmed ger programmets prestanda. Profilering har länge varit standard i utvecklingsprocessen fokuserad på en enda exekvering av ett enda program. Som datorsystem har utvecklats, att förstå helheten på flera datorer har blivit allt viktigare. Som

    superdatorer växer i genomslagskraft och skala, är förståelsen parallella applikationer prestanda och användningsegenskaper avgörande betydelse, eftersom även prestandaförbättringar mindre översätta till stora kostnadsbesparingar. Studien granskar olika verktyg för tillämpningen. Därefter var Perfminer integrerat i Scanias Linux-kluster att profilera CFD och FEA-program som utnyttjar sats kösystem funktioner för kontinuerlig hela systemet profilering, vilket ger prestanda insikter för högpresterande tillämpningar, med försumbar overhead. Perfminer ger stabila, noggranna profiler och ett kluster skala verktyg för prestandaanalys. Perfminer belyser effektivt mikro arkitektoniska flaskhalsar.

    Ladda ner fulltext (pdf)
    fulltext
  • 28.
    Dyczynski, Matheus
    et al.
    Karolinska Inst, Dept Oncol Pathol, Canc Ctr Karolinska, Stockholm, Sweden..
    Yu, Yasmin
    Karolinska Inst, Dept Oncol Pathol, Canc Ctr Karolinska, Stockholm, Sweden.;Sprint Biosci, Huddinge, Sweden..
    Otrocka, Magdalena
    Karolinska Inst, Dept Med Biochem & Biophys, Sci Life Lab Stockholm, Chem Biol Consortium Sweden, Solna, Sweden..
    Parpal, Santiago
    Sprint Biosci, Huddinge, Sweden..
    Braga, Tiago
    Sprint Biosci, Huddinge, Sweden..
    Henley, Aine Brigette
    Sprint Biosci, Huddinge, Sweden..
    Zazzi, Henric
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Lerner, Mikael
    Karolinska Inst, Dept Oncol Pathol, Canc Ctr Karolinska, Stockholm, Sweden..
    Wennerberg, Krister
    Univ Helsinki, Inst Mol Med Finland, FIMM, Helsinki, Finland..
    Viklund, Jenny
    Sprint Biosci, Huddinge, Sweden..
    Martinsson, Jessica
    Sprint Biosci, Huddinge, Sweden..
    Grander, Dan
    Karolinska Inst, Dept Oncol Pathol, Canc Ctr Karolinska, Stockholm, Sweden..
    De Milito, Angelo
    Karolinska Inst, Dept Oncol Pathol, Canc Ctr Karolinska, Stockholm, Sweden.;Sprint Biosci, Huddinge, Sweden..
    Tamm, Katja Pokrovskaja
    Karolinska Inst, Dept Oncol Pathol, Canc Ctr Karolinska, Stockholm, Sweden..
    Targeting autophagy by small molecule inhibitors of vacuolar protein sorting 34 (Vps34) improves the sensitivity of breast cancer cells to Sunitinib2018Ingår i: Cancer Letters, ISSN 0304-3835, E-ISSN 1872-7980, Vol. 435, s. 32-43Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Resistance to chemotherapy is a challenging problem for treatment of cancer patients and autophagy has been shown to mediate development of resistance. In this study we systematically screened a library of 306 known anti-cancer drugs for their ability to induce autophagy using a cell-based assay. 114 of the drugs were classified as autophagy inducers; for 16 drugs, the cytotoxicity was potentiated by siRNA-mediated knock-down of Atg7 and Vps34. These drugs were further evaluated in breast cancer cell lines for autophagy induction, and two tyrosine kinase inhibitors, Sunitinib and Erlotinib, were selected for further studies. For the pharmacological inhibition of autophagy, we have characterized here a novel highly potent selective inhibitor of Vps34, SB02024. SB02024 blocked autophagy in vitro and reduced xenograft growth of two breast cancer cell lines, MDA-MB-231 and MCF-7, in vivo. Vps34 inhibitor significantly potentiated cytotoxicity of Sunitinib and Erlotinib in MCF-7 and MDA-MB-231 in vitro in monolayer cultures and when grown as multicellular spheroids. Our data suggests that inhibition of autophagy significantly improves sensitivity to Sunitinib and Erlotinib and that Vps34 is a promising therapeutic target for combination strategies in breast cancer.

  • 29.
    Dykes, Tim
    et al.
    HPE HPC/AI EMEA Research Lab.
    Foyer, Clément
    HPE HPC/AI EMEA Research Lab, HPC Research Group, Univ. of Bristol.
    Richardson, Harvey
    HPE HPC/AI EMEA Research Lab.
    Svedin, Martin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Podobas, Artur
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Tate, Adrian
    Numerical Algorithms Group Ltd. (NAG).
    McIntosh-Smith, Simon
    HPC Research Group, Univ. of Bristol.
    Mamba: Portable Array-based Abstractions for Heterogeneous High-Performance Systems2021Ingår i: 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Institute of Electrical and Electronics Engineers (IEEE) , 2021Konferensbidrag (Refereegranskat)
    Abstract [en]

    High performance computing architectures have become increasingly heterogeneous in recent times. This growing architectural variety presents a multi-faceted portability problem affecting applications, libraries, programming models, languages, compilers, run-times, and system software. Approaches for performance portability typically focus heavily on efficient usage of parallel compute architectures and less on data locality abstractions and complex memory systems, with minimal support afforded to effective memory management in traditional HPC languages such as C and Fortran. We present Mamba, a library to facilitate usage of heterogeneous memory systems by high performance application/library developers through high level array-based abstractions for memory management supported by a low-level generic memory API. We detail the library design and implementation, demonstrating generic memory allocation, data layout specification, array tiling and heterogeneous transport. We evaluate performance in the context of a typical matrix transposition, DNA sequencing benchmark, and an application use case for high-order spectral element based incompressible flow.

  • 30. Eliasson, P.
    et al.
    Gong, Jing
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Nordström, J.
    A stable and conservative coupling of the unsteady compressible navier-stokes equations at interfaces using finite difference and finite volume methods2018Ingår i: AIAA Aerospace Sciences Meeting, 2018, American Institute of Aeronautics and Astronautics Inc, AIAA , 2018, nr 210059Konferensbidrag (Refereegranskat)
    Abstract [en]

    Stable and conservative interface boundary conditions are developed for the unsteady compressible Navier-Stokes equations using finite difference and finite volume methods. The finite difference approach is based on summation-by-part operators and can be made higher order accurate with boundary conditions imposed weakly. The finite volume approach is an edge- and dual grid-based approach for unstructured grids, formally second order accurate in space, with weak boundary conditions as well. Stable and conservative weak boundary conditions are derived for interfaces between finite difference methods, for finite volume methods and for the coupling between the two approaches. The three types of interface boundary conditions are demonstrated for two test cases. Firstly, inviscid vortex propagation with a known analytical solution is considered. The results show expected error decays as the grid is refined for various couplings and spatial accuracy of the finite difference scheme. The second test case involves viscous laminar flow over a cylinder with vortex shedding. Calculations with various coupling and spatial accuracies of the finite difference solver show that the couplings work as expected and that the higher order finite difference schemes provide enhanced vortex propagation.

  • 31.
    Eriksson, Olivia
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Lindahl, Erik
    KTH, Skolan för teknikvetenskap (SCI), Tillämpad fysik, Biofysik. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Henningson, Dan S.
    KTH, Skolan för teknikvetenskap (SCI), Mekanik, Stabilitet, Transition, Kontroll. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Ynnerman, Anders
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). KTH, Centra, SeRC - Swedish e-Science Research Centre.
    e-Science in Scandinavia2018Ingår i: Informatik-Spektrum, ISSN 0170-6012, E-ISSN 1432-122X, Vol. 41, nr 6, s. 398-404Artikel i tidskrift (Refereegranskat)
  • 32. Fedorov, V. A.
    et al.
    Kholina, E. G.
    Kovalenko, I. B.
    Gudimchuk, N. B.
    Orekhov, P. S.
    Zhmurov, Artem
    KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Update on Performance Analysis of Different Computational Architectures: Molecular Dynamics in Application to Protein-Protein Interactions2020Ingår i: Supercomputing Frontiers and Innovations, ISSN 2409-6008, Vol. 7, nr 4, s. 62-67Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Molecular dynamics has proved itself as a powerful computer simulation method to study dynamics, conformational changes, and interactions of biological macromolecules and their complexes. In order to achieve the best performance and efficiency, it is crucial to benchmark various hardware platforms for the simulations of realistic biomolecular systems with different size and timescale. Here, we compare performance and scalability of a number of commercially available computing architectures using all-atom and coarse-grained molecular dynamics simulations of water and the Ndc80-microtubule protein complex in the GROMACS-2019.4 package. We report typical single-node performance of various combinations of modern CPUs and GPUs, as well as multiple-node performance of the “Lomonosov-2” supercomputer. These data can be used as the practical guidelines for choosing optimal hardware for molecular dynamics simulations. 

  • 33.
    Haine, Christopher
    et al.
    HPE HPC AI Res Lab, Basel, Switzerland..
    Haus, Utz-Uwe
    HPE HPC AI Res Lab, Basel, Switzerland..
    Martinasso, Maxime
    Swiss Natl Supercomp Ctr, CSCS, CH-6900 Lugano, Switzerland..
    Pleiter, Dirk
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. Forschungszentrum Julich, D-52425 Julich, Germany..
    Tessier, Francois
    Inria Rennes Bretagne Atlantique, F-35042 Rennes, France..
    Sarmany, Domokos
    European Ctr Medium Range Weather Forecasts ECMWF, Reading RG2 9AX, Berks, England..
    Smart, Simon
    European Ctr Medium Range Weather Forecasts ECMWF, Reading RG2 9AX, Berks, England..
    Quintino, Tiago
    European Ctr Medium Range Weather Forecasts ECMWF, Reading RG2 9AX, Berks, England..
    Tate, Adrian
    NAG, Oxford, England..
    A Middleware Supporting Data Movement in Complex and Software-Defined Storage and Memory Architectures2021Ingår i: High Performance Computing - ISC High Performance Digital 2021 International Workshops / [ed] Jagode, H Anzt, H Ltaief, H Luszczek, P, Springer Nature , 2021, Vol. 12761, s. 346-357Konferensbidrag (Refereegranskat)
    Abstract [en]

    Among the broad variety of challenges that arise from workloads in a converged HPC and Cloud infrastructure, data movement is of paramount importance, especially oncoming exascale systems featuring multiple tiers of memory and storage. While the focus has, for years, been primarily on optimizing computations, the importance of improving data handling on such architectures is now well understood. As optimization techniques can be applied at different stages (operating system, run-time system, programming environment, and so on), a middleware providing a uniform and consistent data awareness becomes necessary. In this paper, we introduce a novel memory- and data-aware middleware called Maestro, designed for data orchestration.

  • 34.
    Hasan, Md Nur
    et al.
    Department of Chemical and Biological Sciences, S. N. Bose National Centre for Basic Sciences, Block JD, Sector-III, SaltLake, Kolkata 700 106, India.
    Bharati, Ritadip
    School of Physical Sciences, National Institute of Science Education and Research HBNI, Jatni - 752050, Odisha, India.
    Hellsvik, Johan
    KTH, Skolan för teknikvetenskap (SCI), Fysik, Kondenserade materiens teori. KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Delin, Anna
    KTH, Skolan för teknikvetenskap (SCI), Tillämpad fysik, Material- och nanofysik. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Pal, Samir Kumar
    Department of Chemical and Biological Sciences, S. N. Bose National Centre for Basic Sciences, Block JD, Sector-III, SaltLake, Kolkata 700 106, India.
    Bergman, Anders
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden.
    Sharma, Shivalika
    Asia Pacific Center for Theoretical Physics, Pohang 37673, Republic of Korea.
    Di Marco, Igor
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden; Asia Pacific Center for Theoretical Physics, Pohang 37673, Republic of Korea; Department of Physics, Pohang University of Science and Technology, Pohang 37673, Republic of Korea.
    Pereiro, Manuel
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden.
    Thunström, Patrik
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden.
    Oppeneer, Peter M.
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden.
    Eriksson, Olle
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden.
    Karmakar, Debjani
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden; Technical Physics Division, Bhabha Atomic Research Centre, Mumbai 400085, India.
    Magnetism in A V3Sb5 (A=Cs, Rb, and K): Origin and Consequences for the Strongly Correlated Phases2023Ingår i: Physical Review Letters, ISSN 0031-9007, E-ISSN 1079-7114, Vol. 131, nr 19, artikel-id 196702Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The V-based kagome systems AV3Sb5 (A=Cs, Rb, and K) are unique by virtue of the intricate interplay of nontrivial electronic structure, topology, and intriguing fermiology, rendering them to be a playground of many mutually dependent exotic phases like charge-order and superconductivity. Despite numerous recent studies, the interconnection of magnetism and other complex collective phenomena in these systems has yet not arrived at any conclusion. Using first-principles tools, we demonstrate that their electronic structures, complex fermiologies and phonon dispersions are strongly influenced by the interplay of dynamic electron correlations, nontrivial spin-polarization and spin-orbit coupling. An investigation of the first-principles-derived intersite magnetic exchanges with the complementary analysis of q dependence of the electronic response functions and the electron-phonon coupling indicate that the system conforms as a frustrated spin cluster, where the occurrence of the charge-order phase is intimately related to the mechanism of electron-phonon coupling, rather than the Fermi-surface nesting.

  • 35.
    Jansen, Karin A.
    et al.
    AMOLF, Biol Soft Matter Grp, Utrecht, Netherlands.;UMC Utrecht, Dept Pathol, NL-3508 GA Utrecht, Netherlands..
    Zhmurov, Artem
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. Sechenov Univ, Moscow 119991, Russia..
    Vos, Bart E.
    AMOLF, Biol Soft Matter Grp, Utrecht, Netherlands.;Univ Munster, Ctr Mol Biol Inflammat, Inst Cell Biol, Munster, Germany..
    Portale, Giuseppe
    Univ Groningen, Zernike Inst Adv Mat, Macromol Chem & New Polymer Mat, Nijenborgh 4, NL-9747 AG Groningen, Netherlands..
    Hermida-Merino, Daniel
    ESRF, DUBBLE CRG, Netherlands Org Sci Res NWO, 71 Ave Martyrs, F-38000 Grenoble, France..
    Litvinov, Rustem, I
    Univ Penn, Perelman Sch Med, Dept Cell & Dev Biol, Philadelphia, PA 19104 USA.;Kazan Fed Univ, Inst Fundamental Med & Biol, 18 Kremlyovskaya St, Kazan 420008, Russia..
    Tutwiler, Valerie
    Univ Penn, Perelman Sch Med, Dept Cell & Dev Biol, Philadelphia, PA 19104 USA..
    Kurniawan, Nicholas A.
    AMOLF, Biol Soft Matter Grp, Utrecht, Netherlands.;Eindhoven Univ Technol, Dept Biomed Engn, Eindhoven, Netherlands.;Eindhoven Univ Technol, Inst Complex Mol Syst, Eindhoven, Netherlands..
    Bras, Wim
    ESRF, DUBBLE CRG, Netherlands Org Sci Res NWO, 71 Ave Martyrs, F-38000 Grenoble, France.;Oak Ridge Natl Lab, Chem Sci Div, One Bethel Valley Rd, Oak Ridge, TN 37831 USA..
    Weisel, John W.
    Univ Penn, Perelman Sch Med, Dept Cell & Dev Biol, Philadelphia, PA 19104 USA..
    Barsegov, Valeri
    Univ Massachusetts, Dept Chem, 1 Univ Ave, Lowell, MA 01854 USA..
    Koenderink, Gijsje H.
    AMOLF, Biol Soft Matter Grp, Utrecht, Netherlands.;Delft Univ Technol, Kavli Inst Nanosci Delft, Dept Bionanosci, Maasweg 9, NL-2629 HZ Delft, Netherlands..
    Molecular packing structure of fibrin fibers resolved by X-ray scattering and molecular modeling2020Ingår i: Soft Matter, ISSN 1744-683X, E-ISSN 1744-6848, Vol. 16, nr 35, s. 8272-8283Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Fibrin is the major extracellular component of blood clots and a proteinaceous hydrogel used as a versatile biomaterial. Fibrin forms branched networks built of laterally associated double-stranded protofibrils. This multiscale hierarchical structure is crucial for the extraordinary mechanical resilience of blood clots, yet the structural basis of clot mechanical properties remains largely unclear due, in part, to the unresolved molecular packing of fibrin fibers. Here the packing structure of fibrin fibers is quantitatively assessed by combining Small Angle X-ray Scattering (SAXS) measurements of fibrin reconstituted under a wide range of conditions with computational molecular modeling of fibrin protofibrils. The number, positions, and intensities of the Bragg peaks observed in the SAXS experiments were reproduced computationally based on the all-atom molecular structure of reconstructed fibrin protofibrils. Specifically, the model correctly predicts the intensities of the reflections of the 22.5 nm axial repeat, corresponding to the half-staggered longitudinal arrangement of fibrin molecules. In addition, the SAXS measurements showed that protofibrils within fibrin fibers have a partially ordered lateral arrangement with a characteristic transverse repeat distance of 13 nm, irrespective of the fiber thickness. These findings provide fundamental insights into the molecular structure of fibrin clots that underlies their biological and physical properties.

  • 36.
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    A Hybrid MPI+PGAS Approach to Improve Strong Scalability Limits of Finite Element Solvers2020Ingår i: Proceedings - IEEE International Conference on Cluster Computing, ICCC, Institute of Electrical and Electronics Engineers (IEEE) , 2020, s. 303-313Konferensbidrag (Refereegranskat)
    Abstract [en]

    Current finite element codes scale reasonably well as long as each core has sufficient amount of local work that can balance communication costs. However, achieving efficient performance at exascale will require unreasonable large problem sizes, in particular for low-order methods, where the small amount of work per element already is a limiting factor on current post petascale machines. Key bottlenecks for these methods are sparse matrix assembly, where communication latency starts to limit performance as the number of cores increases, and linear solvers, where efficient overlapping is necessary to amortize communication and synchronization cost of sparse matrix vector multiplication and dot products. We present our work on improving strong scalability limits of message passing based general low-order finite element based solvers. Using lightweight one-sided communication offered by partitioned global address space languages (PGAS), we demonstrate that the scalability of performance critical, latency sensitive sparse matrix assembly can achieve almost an order of magnitude better scalability. Linear solvers are also addressed via a signaling put algorithm for low-cost point-to-point synchronization, achieving similar performance as message passing based linear solvers. We introduce a new hybrid MPI+PGAS implementation of the open source general finite element framework FEniCS, replacing the linear algebra backend with a new library written in Unified Parallel C (UPC). A detailed description of the implementation and the hybrid interface to FEniCS is given, and the feasibility of the approach is demonstrated via a performance study of the hybrid implementation on Cray XC40 machines.

  • 37.
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Improving Strong Scalability Limits of Finite Element Based Solvers2019Konferensbidrag (Refereegranskat)
    Abstract [en]

    Current finite element codes scale reasonably well as long as each core has sufficient amount of local work that can balance communication costs. However, achieving efficient performance at exascale will require unreasonable large problem sizes, in particular for low-order methods, where the small amount of work per element already is a limiting factor on current post petascale machines. One of the key bottlenecks for these methods is sparse matrix assembly, where communication latency starts to limit performance as the number of cores increases. We present our work on improving strong scalability limits of message passing based general low-order finite element based solvers. Using lightweight one-sided communication, we demonstrate that the scalability of performance critical, latency sensitive kernels can achieve almost an order of magnitude better scalability. We introduce a new hybrid MPI/PGAS implementation of the open source general finite element framework FEniCS, replacing the linear algebra backend with a new library written in UPC. A detailed description of the implementation and the hybrid interface to FEniCS is given, and we present a detailed performance study of the hybrid implementation on Cray XC40 machines.

  • 38.
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Spectral Element Simulations on the NEC SX-Aurora TSUBASA2021Ingår i: HPC Asia 2021: The International Conference on High Performance Computing in Asia-Pacific Region, Association for Computing Machinery (ACM) , 2021Konferensbidrag (Refereegranskat)
    Abstract [en]

    Following the recent transition in the high performance computing landscape to more heterogeneous architectures, application developers are faced with the challenge of ensuring good performance across a diverse set of platforms. In this paper, we present our work on porting the spectral element code Nek5000 to the recent vector architecture SX-Aurora TSUBASA. Using Nek5000's mini-app Nekbone, we formulate suitable loop transformations in key kernels, allowing for better vectorization, increasing the baseline performance by a factor of six. Using the new transformations, we demonstrate that the main compute intensive matrix-vector and matrix-matrix multiplication kernels achieves close to half the peak performance of a SX-Aurora core. Our work also addresses the gather-scatter operations, a key kernel for efficient matrix-free spectral element formulation. We introduce a new implementation of Nek5000's gather-scatter library with mesh topology awareness for improved vectorization via exploitation of the SX-Aurora's hardware gather-scatter instructions, improving performance with up to 116%. A detailed description of the implementation is given together with a performance study, comparing both single node performance and strong scalability characteristics, running across multiple SX-Aurora cards.

    Ladda ner fulltext (pdf)
    fulltext
  • 39.
    Jansson, Niclas
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Karp, Martin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Perez, Adalberto
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik.
    Mukha, Timofey
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik.
    Ju, Yi
    Max Planck Computing and Data Facility, Garching, Germany.
    Liu, Jiahui
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Pall, Szilard
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Laure, Erwin
    Max Planck Computing and Data Facility, Garching, Germany.
    Weinkauf, Tino
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Schumacher, Jörg
    Technische Universität Ilmenau, Ilmenau, Germany.
    Schlatter, Philipp
    Friedrich-Alexander-Universität (FAU) Erlangen-Nürnberg, Germany.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Exploring the Ultimate Regime of Turbulent Rayleigh–Bénard Convection Through Unprecedented Spectral-Element Simulations2023Ingår i: SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Association for Computing Machinery (ACM) , 2023, s. 1-9, artikel-id 5Konferensbidrag (Refereegranskat)
    Abstract [en]

    We detail our developments in the high-fidelity spectral-element code Neko that are essential for unprecedented large-scale direct numerical simulations of fully developed turbulence. Major inno- vations are modular multi-backend design enabling performance portability across a wide range of GPUs and CPUs, a GPU-optimized preconditioner with task overlapping for the pressure-Poisson equation and in-situ data compression. We carry out initial runs of Rayleigh–Bénard Convection (RBC) at extreme scale on the LUMI and Leonardo supercomputers. We show how Neko is able to strongly scale to 16,384 GPUs and obtain results that are not pos- sible without careful consideration and optimization of the entire simulation workflow. These developments in Neko will help resolv- ing the long-standing question regarding the ultimate regime in RBC. 

  • 40.
    Jansson, Niclas
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Karp, Martin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Podobas, Artur
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Programvaruteknik och datorsystem, SCS.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Schlatter, Philipp
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik, Turbulent simulations laboratory.
    Neko: A modern, portable, and scalable framework for high-fidelity computational fluid dynamics2024Ingår i: Computers & Fluids, ISSN 0045-7930, E-ISSN 1879-0747, Vol. 275, s. 106243-106243, artikel-id 106243Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Computational fluid dynamics (CFD), in particular applied to turbulent flows, is a research area with great engineering and fundamental physical interest. However, already at moderately high Reynolds numbers the computational cost becomes prohibitive as the range of active spatial and temporal scales is quickly widening. Specifically scale-resolving simulations, including large-eddy simulation (LES) and direct numerical simulations (DNS), thus need to rely on modern efficient numerical methods and corresponding software implementations. Recent trends and advancements, including more diverse and heterogeneous hardware in High-Performance Computing (HPC), are challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim “software outlives hardware” may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. In this paper, we present Neko, a new portable framework for high-order spectral element discretization, targeting turbulent flows in moderately complex geometries. Neko is fully available as open software. Unlike prior works, Neko adopts a modern object-oriented approach in Fortran 2008, allowing multi-tier abstractions of the solver stack and facilitating hardware backends ranging from general-purpose processors (CPUs) down to exotic vector processors and FPGAs. We show that Neko’s performance and accuracy are comparable to NekRS, and thus on-par with Nek5000’s successor on modern CPU machines. Furthermore, we develop a performance model, which we use to discuss challenges and opportunities for high-order solvers on emerging hardware

  • 41.
    Ju, Yi
    et al.
    Max Planck Computing and Data Facility, Max Planck Computing and Data Facility.
    Li, Mingshuai
    Technical University of Munich, Technical University of Munich.
    Perez, Adalberto
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik.
    Bellentani, Laura
    CINECA, Cineca.
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Schlatter, Philipp
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik. Friedrich-Alexander-Universität Erlangen-Nürnberg.
    Laure, Erwin
    Max Planck Computing and Data Facility, Max Planck Computing and Data Facility.
    In-Situ Techniques on GPU-Accelerated Data-Intensive Applications2023Ingår i: Proceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023Konferensbidrag (Refereegranskat)
    Abstract [en]

    The computational power of High-Performance Computing (HPC) systems is constantly increasing, however, their input/output (IO) performance grows relatively slowly, and their storage capacity is also limited. This unbalance presents significant challenges for applications such as Molecular Dynamics (MD) and Computational Fluid Dynamics (CFD), which generate massive amounts of data for further visualization or analysis. At the same time, checkpointing is crucial for long runs on HPC clusters, due to limited walltimes and/or failures of system components, and typically requires the storage of large amount of data. Thus, restricted IO performance and storage capacity can lead to bottlenecks for the performance of full application workflows (as compared to computational kernels without IO). In-situ techniques, where data is further processed while still in memory rather to write it out over the I/O subsystem, can help to tackle these problems. In contrast to traditional post-processing methods, in-situ techniques can reduce or avoid the need to write or read data via the IO subsystem. They offer a promising approach for applications aiming to leverage the full power of large scale HPC systems. In-situ techniques can also be applied to hybrid computational nodes on HPC systems consisting of graphics processing units (GPUs) and central processing units (CPUs). On one node, the GPUs would have significant performance advantages over the CPUs. Therefore, current approaches for GPU-accelerated applications often focus on maximizing GPU usage, leaving CPUs underutilized. In-situ tasks using CPUs to perform data analysis or preprocess data concurrently to the running simulation, offer a possibility to improve this underutilization.

  • 42.
    Karmakar, Debjani
    et al.
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden; Technical Physics Division, Bhabha Atomic Research Centre, Mumbai 400085, India.
    Pereiro, Manuel
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden.
    Hasan, Md Nur
    Department of Chemical and Biological Sciences, S. N. Bose National Centre for Basic Sciences, Block JD, Sector-III, SaltLake, Kolkata 700 106, India.
    Bharati, Ritadip
    School of Physical Sciences, National Institute of Science Education and Research, Homi Bhabha National Institute (HBNI), Jatni, 752050 Odisha, India.
    Hellsvik, Johan
    KTH, Skolan för teknikvetenskap (SCI), Fysik, Kondenserade materiens teori. KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Delin, Anna
    KTH, Skolan för teknikvetenskap (SCI), Tillämpad fysik, Material- och nanofysik. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Pal, Samir Kumar
    Department of Chemical and Biological Sciences, S. N. Bose National Centre for Basic Sciences, Block JD, Sector-III, SaltLake, Kolkata 700 106, India.
    Bergman, Anders
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden.
    Sharma, Shivalika
    Asia Pacific Center for Theoretical Physics, Pohang 37673, Republic of Korea.
    Di Marco, Igor
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden; Asia Pacific Center for Theoretical Physics, Pohang 37673, Republic of Korea; Department of Physics, Pohang University of Science and Technology, Pohang 37673, Republic of Korea.
    Thunström, Patrik
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden.
    Oppeneer, Peter M.
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden.
    Eriksson, Olle
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden.
    Magnetism in A V3Sb5 (A=Cs, Rb, K): Complex landscape of dynamical magnetic textures2023Ingår i: Physical Review B, ISSN 2469-9950, E-ISSN 2469-9969, Vol. 108, nr 17, artikel-id 174413Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We have investigated the dynamical magnetic properties of the V-based kagome stibnite compounds by combining the ab initio-extracted magnetic parameters of a spin-Hamiltonian, like inter-site exchange parameters, magnetocrystalline anisotropy and site projected magnetic moments, with full-fledged simulations of atomistic spin- dynamics. Our calculations reveal that, in addition to a ferromagnetic order along the [001] direction, the system hosts a complex landscape of magnetic configurations comprised of commensurate and incommensurate spin spirals along the [010] direction. The presence of such chiral magnetic textures may be the key toward solving the mystery about the origin of the experimentally observed inherent breaking of the C6 rotational, mirror, and the time-reversal symmetry.

  • 43.
    Karp, Martin
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Podobas, Artur
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Schlatter, Philipp
    KTH, Skolan för teknikvetenskap (SCI), Centra, Linné Flow Center, FLOW. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik.
    Markidis, Stefano
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Optimization of Tensor-product Operations in Nekbone on GPUs2020Konferensbidrag (Refereegranskat)
    Abstract [en]

    In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we continue this effort and optimize the main tensor-product operation in Nekbone further. Our optimization is done in CUDA and uses a different, 2D, thread structure to make the computations layer by layer. This enables us to use loop unrolling as well as utilize registers and shared memory efficiently. Our implementation is then compared on both the Pascal and Volta GPU architectures to previous GPU versions of Nekbone as well as a measured roofline. The results show that our implementation outperforms previous GPU Nekbone implementations by 6-10%. Compared to the measured roofline, we obtain 77-92% of the peak performance for both Nvidia P100 and V100 GPUs for inputs with 1024-4096 elements and polynomial degree 9.

    Ladda ner fulltext (pdf)
    fulltext
  • 44.
    Karp, Martin
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Podobas, Artur
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Schlatter, Philipp
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Reducing Communication in the Conjugate Gradient Method: A Case Study on High-Order Finite Elements2022Ingår i: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2022, Association for Computing Machinery (ACM) , 2022, artikel-id 2Konferensbidrag (Refereegranskat)
    Abstract [en]

    Currently, a major bottleneck for several scientific computations is communication, both communication between different processors, so-called horizontal communication, and vertical communication between different levels of the memory hierarchy. With this bottleneck in mind, we target a notoriously communication-bound solver at the core of many high-performance applications, namely the conjugate gradient method (CG). To reduce the communication we present lower bounds on the vertical data movement in CG and go on to make a CG solver with reduced data movement. Using our theoretical analysis we apply our CG solver on a high-performance discretization used in practice, the spectral element method (SEM). Guided by our analysis, we show that for the Poisson equation on modern GPUs we can improve the performance by 30% by both rematerializing the discrete system and by reformulating the system to work on unique degrees of freedom. In order to investigate how horizontal communication can be reduced, we compare CG to two communication-reducing techniques, namely communication-avoiding and pipelined CG. We strong scale up to 4096 CPU cores and showcase performance improvements of upwards of 70% for pipelined CG compared to standard CG when applied on SEM at scale. We show that in addition to improving the scaling capabilities of the solver, initial measurements indicate that the convergence of SEM is largely unaffected by pipelined CG.

  • 45.
    Karp, Martin
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Liu, Felix
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). Raysearch Laboratories..
    Stanly, Ronith
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik.
    Rezaeiravesh, Saleh
    The University of Manchester, Manchester, United Kingdom.
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Schlatter, Philipp
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik. Friedrich-Alexander-Universität (FAU) Erlangen-Nürnberg, Erlangen, Germany.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Uncertainty Quantification of Reduced-Precision Time Series in Turbulent Channel Flow2023Ingår i: Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Association for Computing Machinery (ACM) , 2023, s. 387-390Konferensbidrag (Refereegranskat)
    Abstract [en]

    With increased computational power through the use of arithmetic in low-precision, a relevant question is how lower precision affects simulation results, especially for chaotic systems where analytical round-off estimates are non-trivial to obtain. In this work, we consider how the uncertainty of the time series of a direct numerical simulation of turbulent channel flow at Ret = 180 is affected when restricted to a reduced-precision representation. We utilize a non-overlapping batch means estimator and find that the mean statistics can, in this case, be obtained with significantly fewer mantissa bits than conventional IEEE-754 double precision, but that the mean values are observed to be more sensitive in the middle of the channel than in the near-wall region. This indicates that using lower precision in the near-wall region, where the majority of the computational efforts are required, may benefit from low-precision floating point units found in upcoming computer hardware.

  • 46.
    Karp, Martin
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). Division of Computational Science and Technology, EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
    Massaro, Daniele
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik. SimEx/FLOW, Engineering Mechanics, KTH Royal Institute of Technology, Stockholm, Sweden.
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. PDC Centre for High Performance Computing, EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
    Hart, Alistair
    Hewlett Packard Enterpise (HPE), UK.
    Wahlgren, Jacob
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). Division of Computational Science and Technology, EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
    Schlatter, Philipp
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik. SimEx/FLOW, Engineering Mechanics, KTH Royal Institute of Technology, Stockholm, Sweden.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). Division of Computational Science and Technology, EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
    Large-scale direct numerical simulations of turbulence using GPUs and modern Fortran2023Ingår i: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present our approach to making direct numerical simulations of turbulence with applications in sustainable shipping. We use modern Fortran and the spectral element method to leverage and scale on supercomputers powered by the Nvidia A100 and the recent AMD Instinct MI250X GPUs, while still providing support for user software developed in Fortran. We demonstrate the efficiency of our approach by performing the world’s first direct numerical simulation of the flow around a Flettner rotor at Re = 30,000 and its interaction with a turbulent boundary layer. We present a performance comparison between the AMD Instinct MI250X and Nvidia A100 GPUs for scalable computational fluid dynamics. Our results show that one MI250X offers performance on par with two A100 GPUs and has a similar power efficiency based on readings from on-chip energy sensors.

  • 47.
    Karp, Martin
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Podobas, Artur
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC. KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Kenter, Tobias
    Paderborn University.
    Plessl, Christian
    Paderborn University.
    Schlatter, Philipp
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik. KTH, Skolan för teknikvetenskap (SCI), Centra, Linné Flow Center, FLOW. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST). KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Appendix to High-Performance Spectral Element Methods on Field-Programmable Gate Arrays2020Övrigt (Övrigt vetenskapligt)
    Abstract [en]

    In this Appendix we display some results we omitted fromour article ”High-Performance Spectral Element Methods onField-Programmable Gate Arrays”. In particular we showcasethe measured bandwidth for the FPGA we used (Stratix 10) aswell as the performance for our accelerator at different stagesof optimization. In addition to this, we show illustrate morepractical aspects of our performance/resource modeling

    Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the comforts of general-purpose architectures in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate Array (FPGA), which strikes a good balance between complexity and performance.In this paper, we study modern FPGAs' applicability for use in accelerating the Spectral Element Method (SEM) core to many computational fluid dynamics (CFD) applications. We design a custom SEM hardware accelerator that we evaluate and empirically evaluate on the latest Stratix 10 SX-series FPGAs and position its performance (and power-efficiency) against state-of-the-art systems such as ARM ThunderX2, NVIDIA Pascal/Volta/Ampere Tesla-series cards, and general-purpose manycore CPUs. Finally, we develop a performance model for our SEM-accelerator, which we use to project the performance and role of future FPGAs to accelerator CFD applications, ultimately answering the question: what characteristics would a perfect FPGA for CFD applications have?

    Ladda ner fulltext (pdf)
    fulltext
  • 48.
    Karp, Martin
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Podobas, Artur
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Kenter, Tobias
    Paderborn University.
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Plessl, Christian
    Paderborn University.
    Schlatter, Philipp
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays: Design, Evaluation, and Future Challenges2022Ingår i: HPCAsia2022: International Conference on High Performance Computing in Asia-Pacific Region, Association for Computing Machinery (ACM) , 2022, s. 125-136Konferensbidrag (Refereegranskat)
    Abstract [en]

    The impending termination of Moore’s law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand.

    In this work, we design a custom FPGA-based accelerator for a computational fluid dynamics (CFD) code. Unlike prior work – which often focuses on accelerating small kernels – we target the entire Poisson solver on unstructured meshes based on the high-fidelity spectral element method (SEM) used in modern state-of-the-art CFD systems. We model our accelerator using an analytical performance model based on the I/O cost of the algorithm. We empirically evaluate our accelerator on a state-of-the-art Intel Stratix 10 FPGA in terms of performance and power consumption and contrast it against existing solutions on general-purpose processors (CPUs). Finally, we propose a data movement-reducing technique where we compute geometric factors on the fly, which yields significant (700+ Gflop/s) single-precision performance and an upwards of 2x reduction in runtime for the local evaluation of the Laplace operator.

    We end the paper by discussing the challenges and opportunities of using reconfigurable architecture in the future, particularly in the light of emerging (not yet available) technologies.

  • 49.
    Karp, Martin
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Suarez, Estela
    Forschungszentrum Julich GmbH, Juelich Supercomputing Centre; Rheinische Friedrich-Wilhelms-Universität Bonn, Institut für Informatik.
    Meinke, Jan
    Forschungszentrum Julich GmbH, Juelich Supercomputing Centre.
    Andersson, Måns
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Schlatter, Philipp
    KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik, Turbulent simulations laboratory.
    Markidis, Stefano
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Jansson, Niclas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Experience and Analysis of Scalable High-Fidelity Computational Fluid Dynamics on Modular Supercomputing ArchitecturesManuskript (preprint) (Övrigt vetenskapligt)
    Abstract [en]

    The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high-fidelity CFD using the spectral element method can exploit the modular supercomputing architecture at scale through domain partitioning, where the computational domain is split between GPUs and CPUs. We investigate several different flow cases and computer systems based on the MSA. We observe that for our simulations, the communication overhead and load balancing issues incurred by incorporating different computing architectures are seldom worthwhile, especially when I/O is also considered, but when the simulation at hand requires more than the combined global memory on the GPUs, utilizing additional CPUs to increase the available memory can be fruitful. We support our results with a simple performance model to assess when running across modules might be beneficial. For a smaller supercomputer where the computation takes significant amounts of time on the CPU module, it can be beneficial to also use a GPU module to decrease the execution time significantly.

  • 50.
    Khan, Monsurul
    et al.
    Purdue Univ, Dept Mech Engn, Indiana, PA 47905 USA..
    More, Rishabh V.
    Purdue Univ, Dept Mech Engn, Indiana, PA 47905 USA.;MIT, Dept Mech Engn, Cambridge, MA 02139 USA..
    Banaei, Arash Alizad
    KTH, Skolan för teknikvetenskap (SCI), Centra, Linné Flow Center, FLOW. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik. KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Brandt, Luca
    KTH, Skolan för teknikvetenskap (SCI), Centra, Linné Flow Center, FLOW. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för teknikvetenskap (SCI), Teknisk mekanik, Strömningsmekanik och Teknisk Akustik.
    Ardekani, Arezoo M.
    Purdue Univ, Dept Mech Engn, Indiana, PA 47905 USA..
    Rheology of concentrated fiber suspensions with a load-dependent friction coefficient2023Ingår i: Physical Review Fluids, E-ISSN 2469-990X, Vol. 8, nr 4, artikel-id 044301Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We investigate the effects of fiber aspect ratio, roughness, flexibility, and volume fraction on the rheology of concentrated suspensions in a steady shear flow using direct numerical simulations. We model the fibers as inextensible continuous flexible slender bodies with the Euler-Bernoulli beam equation governing their dynamics suspended in an incompressible Newtonian fluid. The fiber dynamics and fluid flow coupling is achieved using the immersed boundary method. In addition, the fiber surface roughness might lead to interfiber contacts, resulting in normal and tangential forces between the fibers, which follow Coulomb's law of friction. The surface roughness is modeled as hemispherical pro-trusions on the fiber surfaces. Their deformation results in a normal load-dependent friction coefficient. Our simulations accurately predict the experimentally observed shear thinning in fiber suspensions. Furthermore, we find that the suspension viscosity eta increases with increasing the volume fraction, roughness, fiber rigidity, and aspect ratio. The increase in eta is the macroscopic manifestation of a similar increase in the microscopic contact contribution to the total stress with these parameters. In addition, we observe positive and negative first N1 and second N2 normal stress differences, respectively, with |N2| < |N1|, in agreement with previous experiments. Last, we propose a modified Maron-Pierce law to quantify the reduction in the jamming volume fraction by increasing the fiber aspect ratio and roughness. Our results and analysis establish the use of fiber surface tribology to tune the suspension flow behavior.

12 1 - 50 av 100
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf