kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
FFTc: An MLIR Dialect for Developing HPC Fast Fourier Transform Libraries
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0009-0004-9193-1253
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0001-5452-6794
KTH, School of Engineering Sciences (SCI), Engineering Mechanics.ORCID iD: 0000-0002-6384-2630
KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0003-0639-0639
2023 (English)In: Euro-Par 2022: Parallel Processing Workshops: Euro-Par 2022 International Workshops, Glasgow, UK, August 22–26, 2022, Revised Selected Papers, 2023, p. 80-92Conference paper, Published paper (Refereed)
Abstract [en]

Discrete Fourier Transform (DFT) libraries are one of the most critical software components for scientific computing. Inspired by FFTW, a widely used library for DFT HPC calculations, we apply compiler technologies for the development of HPC Fourier transform libraries. In this work, we introduce FFTc, a domain-specific language, based on Multi-Level Intermediate Representation (MLIR), for expressing Fourier Transform algorithms. We present the initial design, implementation, and preliminary results of FFTc.

Place, publisher, year, edition, pages
2023. p. 80-92
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:kth:diva-327027DOI: 10.1007/978-3-031-31209-0_6Scopus ID: 2-s2.0-85161396397OAI: oai:DiVA.org:kth-327027DiVA, id: diva2:1757681
Conference
Euro-Par 2022 International Workshops, Glasgow, UK, August 22–26, 2022
Note

QC 20231122

Available from: 2023-05-17 Created: 2023-05-17 Last updated: 2025-05-16Bibliographically approved
In thesis
1. Leveraging Intermediate Representations for High-Performance Portable Discrete Fourier Transform Frameworks: with Application to Molecular Dynamics
Open this publication in new window or tab >>Leveraging Intermediate Representations for High-Performance Portable Discrete Fourier Transform Frameworks: with Application to Molecular Dynamics
2023 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The Discrete Fourier Transform (DFT) and its improved formulations, the Fast Fourier Transforms (FFTs), are vital for scientists and engineers in a range of domains from signal processing to the solution of partial differential equations.  A growing trend in Scientific Computing is heterogeneous computing, where accelerators are used instead or together with CPUs. This has led to problems for developers in unifying portability, performance, and productivity. 

This thesis first motivates this work by showing the importance of having efficient DFT calculations, describes the DFT algorithm and a formulation based on matrix-factorizations which has been developed to formulate FFT algorithms and express their parallelism to exploit modern computer architectures, such as accelerators.

The first paper is a motivating study of the breakdown of the performance and scalability of the high-performance Molecular Dynamics code GROMACS where DFT calculations are a main performance bottleneck. In particular, the long-range interactions are solved with the Particle-Mesh Ewald algorithm which uses a three-dimensional Fast Fourier Transform. 

The two following papers present two approaches to leverage factorization with the help of two different frameworks using Intermediate Representation and compiler technology, for the development of fast and portable code. The second paper presents a front-end and a pipeline for code generation in a domain-specific language based on Multi-Level Intermediate Representation (MLIR) for developing Fast Fourier Transform libraries. The last paper investigates and optimizes an implementation of an important kernel within the matrix-factorization framework: the batched DFT. It is implemented with data-centric programming and a data-centric intermediate representation called Stateful Dataflow multi-graphs (SDFG). The paper evaluates strategies for complex-valued data layout for performance and portability and we show that there is a trade-off between portability and maintainability in using the native complex data type and that an SDFG-level abstraction could be beneficial for developing higher-level applications.

Abstract [sv]

Den diskreta Fouriertransformen och dess snabba implementeringar är viktiga för vetenskap och ingenjörskonst. Den har tillämningar i ämnen som singnal behnadling, lösning av partiella diffrentialekvationer och många andra ämnen inom vetenskapliga beräkningar. En växande trend inom ämnet är heterogena datorer där acceleratorer som är specialicerade till vissa beräkningar kan användas som stöd för traditionella processorer. Detta leder till problem med portabilitet, prestanda och produktivitet. 

Avhandligen inleds med att beskriva diskret Fouriertransform och ett ramverk för faktorisering till glesa strukturerade matriser som tillsammans representerar snabb Fouriertransform (FFT, Eng.) och som kan användas för att uttrycka parallelism i algorithmerna. För att motivera arbete med FFT i vetenskapliga beräkningar så utväreras den parallela prestandan av GROMACS: en kod för simulering av Molekyldynmik. GROMACS använder en tredimensionell diskret Fouriertransform för att finna den elektrostatiska potentialen med hjälp av Particle-Mesh Ewald-tekniken. 

De följande två artiklarna presenterar två olika ramverk för att utnyttja mellankod (IR Eng.) och kompilatorteknik, för utvecklandet av  snabb och portabel kod. Den andra artikeln beskriver arbetet att utveckla ett domänspecifikt språk baserat på Multi-Level Intermediate Representation för design av snabba Fouriertransformer baserat på matrisfaktorisering. Den sista artikeln undersöker och optimerar en viktig komponent för matrisfaktorisering av diskreta Fouriertransformen: att beräkna flera små diskreta Fouriertransformer parallelt. Detta är gjort med DaCe som är ramverk för data-centrisk programmering som använder en mellankod kallad SDFG. I artikeln utvärderas strategier för data format av komplexa tal för prestanda och portabilitet, och visar att en abstraktion med hjälp av SDFG kan motiveras. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2023. p. 38
Series
TRITA-EECS-AVL ; 2023:43
Keywords
Intermediate Representation, Discrete Fourier Transform, Fast Fourier Transform, Molecular Dynamics, Mellankod, diskret Fouriertransform, snabb Fouriertransform, Molekyldynamik
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-326223 (URN)978-91-8040-586-7 (ISBN)
Presentation
2023-06-09, VIC-studion, Lindstedtsvägen 9, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20230522

Available from: 2023-05-22 Created: 2023-05-19 Last updated: 2023-05-29Bibliographically approved
2. Domain-Specific Compilation Framework with High-Level Tensor Abstraction for Fast Fourier Transform and Finite-Difference Time-Domain Methods
Open this publication in new window or tab >>Domain-Specific Compilation Framework with High-Level Tensor Abstraction for Fast Fourier Transform and Finite-Difference Time-Domain Methods
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

With the end of Dennard scaling, hardware performance improvements now stem from increased architectural complexity, which in turn demands more sophisticated programming models. Today’s computing landscape includes a broad spectrum of hardware targets—CPUs, GPUs, FPGAs, and domain-specific ASICs—each requiring substantial manual effort and low-level tuning to fully exploit their potential. Performance programming has evolved beyond traditional code optimization and increasingly depends on domain-specific compilers, constraint-solving frameworks, advanced performance models, and automatic or learned strategies for code generation.

Conventional implementations of numerical libraries often rely on handwritten, platform-specific kernels. While such kernels may achieve high performance for selected routines, they typically underperform in others, and their lack of portability results in high development overhead and performance bottlenecks. This impedes scalability across heterogeneous hardware systems.

To address these challenges, this thesis presents the design and implementation of end-to-end domain-specific compilers for numerical workloads, with a focus on applications such as Fast Fourier Transform (FFT) and Finite Difference Time Domain (FDTD) solvers. The proposed framework is built on the Multi-Level Intermediate Representation (MLIR) and Low-Level Virtual Machine (LLVM) infrastructures. It models compute kernels as operations on 3D tensor abstractions with explicit computational semantics. High-level optimizations—including loop tiling, fusion, and vectorization—are automatically applied by the compiler.

We evaluate the proposed code generation pipeline across diverse hardware platforms, including Intel, AMD, and ARM CPUs, as well as GPUs. Experimental results demonstrate the approach’s ability to deliver both high performance and portability across heterogeneous architectures.

Abstract [sv]

När Dennard-skalningen tog slut, började förbättringar i hårdvarans prestanda komma från mer komplexa arkitekturer. Detta kräver avancerade sätt att skriva program. Idag finns många olika typer av hårdvara — CPU:er, GPU:er, FPGA:er och specialbyggda chip (ASIC) — som alla kräver manuellt arbete och lågnivå-optimering för att fungera bra. Prestandaprogrammering handlar inte längre bara om att förbättra kod, utan bygger allt mer på domänspecifika kompilatorer, smarta prestandamodeller och automatiska metoder för att generera bra kod. Vanliga numeriska bibliotek bygger ofta på handskriven kod anpassad för en viss plattform. Sådan kod kan vara snabb för vissa delar, men är ofta långsam för andra och svår att flytta mellan olika system. Det gör utveckling dyr och skapar flaskhalsar som gör det svårt att använda kod på många typer av hårdvara. För att lösa dessa problem presenterar denna avhandling en metod för att bygga domäspecifika kompilatorer för numeriska beräkningar. Fokus ligger på två typer av metoder: Snabb Fouriertransform (FFT) och Finita Differens i Tidsdomän (FDTD). Ramverket bygger på MLIR (Multi-Level Intermediate Representation) och LLVM, och använder 3D-tensorer med explicit beräkningslogik. Kompilatorn gör automatiska optimeringar som loop-delning, sammanslagning och vektorisering.Vi testar detta på olika typer av hårdvara: CPU:er från Intel, AMD och ARM samt GPU:er. Resultaten visar både hög prestanda och god portabilitet mellan olika plattformar.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. p. vii, 70
Series
TRITA-EECS-AVL ; 2025:67
Keywords
Numerical Libraries, Fast Fourier Transform (FFT), Finite Difference Time Domain (FDTD), Domain-Specific Compilers, Multilevel Intermediate Representation (MLIR), Low-Level Virtual Machine (LLVM), Numeriska bibliotek, Snabb Fouriertransform (FFT), Ändlig differens i tidsdomän (FDTD), Domänspecifika kompilatorer, Flerskikts mellanrepresentation (MLIR), Lågnivå virtuell maskin (LLVM)
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-363493 (URN)978-91-8106-311-0 (ISBN)
Public defence
2025-06-12, Sal D3, Lindstedtsvägen 9, KTH Campus, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20250519

Available from: 2025-05-19 Created: 2025-05-16 Last updated: 2025-06-30Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopusarxiv

Authority records

He, YifeiPodobas, ArturAndersson, MånsMarkidis, Stefano

Search in DiVA

By author/editor
He, YifeiPodobas, ArturAndersson, MånsMarkidis, Stefano
By organisation
Computational Science and Technology (CST)Engineering MechanicsSeRC - Swedish e-Science Research Centre
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 374 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf