kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Leveraging MLIR for Loop Vectorization and GPU Porting of FFT Libraries
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0009-0004-9193-1253
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0001-5452-6794
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0003-0639-0639
2024 (English)In: Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28 – September 1, 2023, Revised Selected Papers, Springer Nature , 2024, Vol. 14351, p. 207-218Conference paper, Published paper (Refereed)
Abstract [en]

FFTc is a Domain-Specific Language (DSL) for designing and generating Fast Fourier Transforms (FFT) libraries. The FFTc uniqueness is that it leverages and extend Multi-Level Intermediate Representation (MLIR) dialects to optimize FFT code generation. In this work, we present FFTc extensions and improvements such as the possibility of using different data layout for complex-value arrays, and sparsification to enable efficient vectorization, and a seamless porting of FFT libraries to GPU systems. We show that, on CPUs, thanks to vectorization, the performance of the FFTc-generated FFT is comparable to performance of FFTW, a state-of-the-art FFT libraries. We also present the initial performance results for FFTc on Nvidia GPUs.

Place, publisher, year, edition, pages
Springer Nature , 2024. Vol. 14351, p. 207-218
Series
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISSN 0302-9743 ; 14351
Keywords [en]
Automatic Loop Vectorization, FFTc, GPU Porting, LLVM, MLIR
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-346538DOI: 10.1007/978-3-031-50684-0_16ISI: 001279250600016Scopus ID: 2-s2.0-85192276218OAI: oai:DiVA.org:kth-346538DiVA, id: diva2:1858454
Conference
International workshops held at the 29th International Conference on Parallel and Distributed Computing, Euro-Par 2023, Limassol, Cyprus, Aug 28 2023 - Sep 1 2023
Note

QC 20240521

Available from: 2024-05-16 Created: 2024-05-16 Last updated: 2025-05-16Bibliographically approved
In thesis
1. Domain-Specific Compilation Framework with High-Level Tensor Abstraction for Fast Fourier Transform and Finite-Difference Time-Domain Methods
Open this publication in new window or tab >>Domain-Specific Compilation Framework with High-Level Tensor Abstraction for Fast Fourier Transform and Finite-Difference Time-Domain Methods
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

With the end of Dennard scaling, hardware performance improvements now stem from increased architectural complexity, which in turn demands more sophisticated programming models. Today’s computing landscape includes a broad spectrum of hardware targets—CPUs, GPUs, FPGAs, and domain-specific ASICs—each requiring substantial manual effort and low-level tuning to fully exploit their potential. Performance programming has evolved beyond traditional code optimization and increasingly depends on domain-specific compilers, constraint-solving frameworks, advanced performance models, and automatic or learned strategies for code generation.

Conventional implementations of numerical libraries often rely on handwritten, platform-specific kernels. While such kernels may achieve high performance for selected routines, they typically underperform in others, and their lack of portability results in high development overhead and performance bottlenecks. This impedes scalability across heterogeneous hardware systems.

To address these challenges, this thesis presents the design and implementation of end-to-end domain-specific compilers for numerical workloads, with a focus on applications such as Fast Fourier Transform (FFT) and Finite Difference Time Domain (FDTD) solvers. The proposed framework is built on the Multi-Level Intermediate Representation (MLIR) and Low-Level Virtual Machine (LLVM) infrastructures. It models compute kernels as operations on 3D tensor abstractions with explicit computational semantics. High-level optimizations—including loop tiling, fusion, and vectorization—are automatically applied by the compiler.

We evaluate the proposed code generation pipeline across diverse hardware platforms, including Intel, AMD, and ARM CPUs, as well as GPUs. Experimental results demonstrate the approach’s ability to deliver both high performance and portability across heterogeneous architectures.

Abstract [sv]

När Dennard-skalningen tog slut, började förbättringar i hårdvarans prestanda komma från mer komplexa arkitekturer. Detta kräver avancerade sätt att skriva program. Idag finns många olika typer av hårdvara — CPU:er, GPU:er, FPGA:er och specialbyggda chip (ASIC) — som alla kräver manuellt arbete och lågnivå-optimering för att fungera bra. Prestandaprogrammering handlar inte längre bara om att förbättra kod, utan bygger allt mer på domänspecifika kompilatorer, smarta prestandamodeller och automatiska metoder för att generera bra kod. Vanliga numeriska bibliotek bygger ofta på handskriven kod anpassad för en viss plattform. Sådan kod kan vara snabb för vissa delar, men är ofta långsam för andra och svår att flytta mellan olika system. Det gör utveckling dyr och skapar flaskhalsar som gör det svårt att använda kod på många typer av hårdvara. För att lösa dessa problem presenterar denna avhandling en metod för att bygga domäspecifika kompilatorer för numeriska beräkningar. Fokus ligger på två typer av metoder: Snabb Fouriertransform (FFT) och Finita Differens i Tidsdomän (FDTD). Ramverket bygger på MLIR (Multi-Level Intermediate Representation) och LLVM, och använder 3D-tensorer med explicit beräkningslogik. Kompilatorn gör automatiska optimeringar som loop-delning, sammanslagning och vektorisering.Vi testar detta på olika typer av hårdvara: CPU:er från Intel, AMD och ARM samt GPU:er. Resultaten visar både hög prestanda och god portabilitet mellan olika plattformar.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. p. vii, 70
Series
TRITA-EECS-AVL ; 2025:67
Keywords
Numerical Libraries, Fast Fourier Transform (FFT), Finite Difference Time Domain (FDTD), Domain-Specific Compilers, Multilevel Intermediate Representation (MLIR), Low-Level Virtual Machine (LLVM), Numeriska bibliotek, Snabb Fouriertransform (FFT), Ändlig differens i tidsdomän (FDTD), Domänspecifika kompilatorer, Flerskikts mellanrepresentation (MLIR), Lågnivå virtuell maskin (LLVM)
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-363493 (URN)978-91-8106-311-0 (ISBN)
Public defence
2025-06-12, Sal D3, Lindstedtsvägen 9, KTH Campus, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20250519

Available from: 2025-05-19 Created: 2025-05-16 Last updated: 2025-05-20Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

He, YifeiPodobas, ArturMarkidis, Stefano

Search in DiVA

By author/editor
He, YifeiPodobas, ArturMarkidis, Stefano
By organisation
Computational Science and Technology (CST)Software and Computer systems, SCS
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 88 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf