kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Hjort Blindell, GabrielORCID iD iconorcid.org/0000-0001-6794-6413
Alternative names
Publications (10 of 12) Show all publications
Castañeda Lozano, R., Carlsson, M., Hjort Blindell, G. & Schulte, C. (2019). Combinatorial register allocation and instruction scheduling. ACM Transactions on Programming Languages and Systems, 41(3), Article ID 17.
Open this publication in new window or tab >>Combinatorial register allocation and instruction scheduling
2019 (English)In: ACM Transactions on Programming Languages and Systems, ISSN 0164-0925, E-ISSN 1558-4593, Vol. 41, no 3, article id 17Article in journal (Refereed) Published
Abstract [en]

This article introduces a combinatorial optimization approach to register allocation and instruction scheduling, two central compiler problems. Combinatorial optimization has the potential to solve these problems optimally and to exploit processor-specific features readily. Our approach is the first to leverage this potential in practice: it captures the complete set of program transformations used in state-of-the-art compilers, scales to medium-sized functions of up to 1,000 instructions, and generates executable code. This level of practicality is reached by using constraint programming, a particularly suitable combinatorial optimization technique. Unison, the implementation of our approach, is open source, used in industry, and integrated with the LLVM toolchain. An extensive evaluation confirms that Unison generates better code than LLVM while scaling to medium-sized functions. The evaluation uses systematically selected benchmarks from MediaBench and SPEC CPU2006 and different processor architectures (Hexagon, ARM, MIPS). Mean estimated speedup ranges from 1.1% to 10% and mean code size reduction ranges from 1.3% to 3.8% for the different architectures. A significant part of this improvement is due to the integrated nature of the approach. Executing the generated code on Hexagon confirms that the estimated speedup results in actual speedup. Given a fixed time limit, Unison solves optimally functions of up to 946 instructions, nearly an order of magnitude larger than previous approaches. The results show that our combinatorial approach can be applied in practice to trade compilation time for code quality beyond the usual compiler optimization levels, identify improvement opportunities in heuristic algorithms, and fully exploit processor-specific features.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2019
Keywords
Combinatorial optimization, Instruction scheduling, Register allocation
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-262613 (URN)10.1145/3332373 (DOI)000501479500004 ()2-s2.0-85068443848 (Scopus ID)
Note

QC 20191024

Not duplicate with DiVA 1232469

Available from: 2019-10-24 Created: 2019-10-24 Last updated: 2024-03-18Bibliographically approved
Castañeda Lozano, R., Carlsson, M., Hjort Blindell, G. & Schulte, C. (2018). Combinatorial Register Allocation and Instruction Scheduling.
Open this publication in new window or tab >>Combinatorial Register Allocation and Instruction Scheduling
2018 (English)Report (Other academic)
Abstract [en]

This paper introduces a combinatorial optimization approach to register allocation and instruction scheduling, two central compiler problems. Combinatorial optimization has the potential to solve these problems optimally and to exploit processor-specific features readily. Our approach is the first to leverage this potential in practice: it captures the complete set of program transformations used in state-of-the-art compilers, scales to medium-sized functions of up to 1000 instructions, and generates executable code. This level of practicality is reached by using constraint programming, a particularly suitable combinatorial optimization technique. Unison, the implementation of our approach, is open source, used in industry, and integrated with the LLVM toolchain.

An extensive evaluation of estimated speed, code size, and scalability confirms that Unison generates better code than LLVM while scaling to medium-sized functions. The evaluation uses systematically selected benchmarks from MediaBench and SPEC CPU2006 and different processor architectures (Hexagon, ARM, MIPS). Mean estimated speedup ranges from 1% to 9.3% and mean code size reduction ranges from 0.8% to 3.9% for the different architectures. Executing the generated code on Hexagon confirms that the estimated speedup indeed results in actual speedup. Given a fixed time limit, Unison solves optimally functions of up to 647 instructions, delivers improved solutions for functions of up to 874 instructions, and achieves more than 85% of the potential speed for 90% of the functions on Hexagon.

The results in this paper show that our combinatorial approach can be used in practice to trade compilation time for code quality beyond the usual compiler optimization levels, fully exploit processor-specific features, and identify improvement opportunities in existing heuristic algorithms.

National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-232118 (URN)
Note

QC 20180822

Available from: 2018-07-11 Created: 2018-07-11 Last updated: 2024-03-15Bibliographically approved
Hjort Blindell, G. (2018). Universal Instruction Selection. (Doctoral dissertation). KTH Royal Institute of Technology
Open this publication in new window or tab >>Universal Instruction Selection
2018 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

In code generation, instruction selection chooses instructions to implement a given program under compilation, global code motion moves computations from one part of the program to another, and block ordering places program blocks in a consecutive sequence. Local instruction selection chooses instructions one program block at a time while global instruction selection does so for the entire function. This dissertation introduces a new approach called universal instruction selection that integrates global instruction selection with global code motion and block ordering. By doing so, it addresses limitations of existing instruction selection techniques that fail to exploit many of the instructions provided by modern processors.

To handle the combinatorial nature of these problems, the approach is based on constraint programming, a combinatorial optimization method. It relies on a novel model that is simpler and more flexible compared to the techniques used in modern compilers and that captures crucial features ignored by other combinatorial approaches. The dissertation also proposes extensions to the model for integrating instruction scheduling and register allocation, two other important problems of code generation.

The model is enabled by a novel, graph-based representation that unifies data and control flow for entire functions. The representation is crucial for integrating instruction selection with global code motion and for modeling sophisticated instructions, whose behavior contains both data and control flow, as graphs.

Through experimental evaluation, universal instruction selection is demonstrated to handle architectures with a rich instruction set and scales up to functions with hundreds of operations. For these functions, it generates code of equal or better quality compared to the state of the art. The dissertation also demonstrates that there is sufficient data parallelism to be exploited through selection of SIMD instructions and that this exploitation benefits from global code motion. With these results, it is argued that constraint programming is a flexible, practical, competitive, and extensible approach for combining global instruction selection, global code motion, and block ordering.

Abstract [sv]

Inom kodgenerering väljer instruktionsselektering (eng. instruction selection) instruktioner för att implementera ett givet program under kompilering, global kodförflyttning (eng. global code motion) flyttar beräkningar från en del av programmet till en annan, och blockläggning (eng. block ordering) placerar programblock i en sekventiell följd. Lokal instruktionsselektering väljer instruktioner ett programblock i taget medan global instruktionsselektering gör så för hela funktionen. Denna avhandling introducerar en ny metod, kallad universell instruktionsselektering, som integrerar global instruktionsselektering med global kodförflyttning och blockläggning. På så vis åtgärdar den begränsningar hos befintliga instruktionsselekteringsmetoder som misslyckas med att utnyttja många av instruktionerna som erbjuds av moderna processorer.

För att hantera den kombinatoriska naturen av dessa problem tillämpas villkorsprogrammering (eng. constraint programming), en teknik för kombinatorisk optimering. Metoden använder en innovativ model som är enklare och mer flexibel jämfört med metoderna som används i moderna kompilatorer och som fångar viktiga särdrag som ignoreras av andra kombinatoriska metoder. Avhandlingen föreslår också utökningar av modellen för att integrera instruktionsschemaläggning (eng. instruction scheduling) och registerallokering (eng. register allocation), två andra viktiga kodgenereringsproblem.

Modellen möjliggörs av en innovativ, grafbaserad representation som förenar data- och kontrollflöde för hela funktioner. Representationen är avgörande för att integrera instruktionsselektering med global kodförflyttning och för att modellera sofistikerade instruktioner, vars beteende omfattar både data- och kontrollflöde, som grafer.

Genom experimentell utvärdering visas att universell instruktionsselektering kan hantera arkitekturer med ett rikt instruktionsset och skalar upp till funktioner med hundratals beräkningar. För dessa funktioner genererar den kod av motsvarande eller bättre kvalitet än den senaste tekniken. Avhandlingen visar också att det finns tillräckligt med dataparallellism att utnyttja genom selektering av SIMD-instruktioner och att denna exploatering gynnas av global kodförflyttning. Med dessa resultat argumenteras för att villkorsprogrammering är en flexibel, praktisk, konkurrenskraftig, och utökningsbar metod för att kombinera global instruktionsselektering, global kodförflyttning, och blockläggning.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2018. p. 314
Series
TRITA-EECS-AVL ; 2018:11
Keywords
instruction selection, code generation, compilers, constraint programming, combinatorial optimization
National Category
Computer Sciences
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-223599 (URN)978-91-7729-683-6 (ISBN)
Public defence
2018-04-06, Sal B, Electrum, Kungliga Tekniska Högskolan, Kistagången 16, Kista, Stockholm, 13:15 (English)
Opponent
Supervisors
Note

QC 20180226

Available from: 2018-02-26 Created: 2018-02-23 Last updated: 2022-06-26Bibliographically approved
Hjort Blindell, G., Carlsson, M., Castañeda Lozano, R. & Schulte, C. (2017). Complete and Practical Universal Instruction Selection. Paper presented at International Conferences on Compilers, Architectures and Synthesis for Embedded Systems. ACM Transactions on Embedded Computing Systems
Open this publication in new window or tab >>Complete and Practical Universal Instruction Selection
2017 (English)In: ACM Transactions on Embedded Computing Systems, ISSN 1539-9087, E-ISSN 1558-3465Article in journal (Refereed) Published
Abstract [en]

In code generation, instruction selection chooses processor instructions to implement a program under compilation where code quality crucially depends on the choice of instructions. Using methods from combinatorial optimization, this paper proposes an expressive model that integrates global instruction selection with global code motion.The model introduces (1) handling of memory computations and function calls, (2) a method for inserting additional jump instructions where necessary, (3) a dependency-based technique to ensure correct combinations of instructions, (4) value reuse to improve code quality, and (5) an objective function that reduces compilation time and increases scalability by exploiting bounding techniques. The approach is demonstrated to be complete and practical, competitive with LLVM, and potentially optimal (w.r.t. the model) for medium-sized functions. The results show that combinatorial optimization for instruction selection is well-suited to exploit the potential of modern processors in embedded systems.

Place, publisher, year, edition, pages
ACM Press, 2017
Keywords
instruction selection, code generation, constraint programming, combinatorial optimization
National Category
Computer Systems Embedded Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-213968 (URN)10.1145/3126528 (DOI)000414353800002 ()2-s2.0-85030692980 (Scopus ID)
Conference
International Conferences on Compilers, Architectures and Synthesis for Embedded Systems
Funder
Swedish Research Council, 621-2011-6229
Note

QC 20170908

Available from: 2017-09-07 Created: 2017-09-07 Last updated: 2022-06-27Bibliographically approved
Hjort Blindell, G. (2016). Instruction Selection: Principles, Methods, & Applications. Springer
Open this publication in new window or tab >>Instruction Selection: Principles, Methods, & Applications
2016 (English)Book (Refereed)
Abstract [en]

This book presents a comprehensive, structured, up-to-date survey on instruction selection. The survey is structured according to two dimensions: approaches to instruction selection from the past 45 years are organized and discussed according to their fundamental principles and according to the characteristics of the supported machines instructions. The fundamental principles are macro expansion, tree covering, DAG covering, and graph covering. The machine instruction characteristics introduced are single-output, multi-output, disjoint-output, inter-block, and interdependent machine instructions. The survey also examines problems that have yet to be addressed by existing approaches.

The book is suitable for advanced undergraduate students in computer science, graduate students, practitioners, and researchers.

Place, publisher, year, edition, pages
Springer, 2016
Keywords
instruction selection, compilers, code generation, survey
National Category
Computer Systems Embedded Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-190112 (URN)10.1007/978-3-319-34019-7 (DOI)2-s2.0-85017634248 (Scopus ID)978-3-319-34019-7 (ISBN)
Funder
Swedish Research Council, VR 621-2011-6229
Note

QC 20160810

Available from: 2016-08-09 Created: 2016-08-09 Last updated: 2022-06-22Bibliographically approved
Castañeda Lozano, R., Carlsson, M., Hjort Blindell, G. & Schulte, C. (2016). Register allocation and instruction scheduling in Unison. In: Proceedings of CC 2016: The 25th International Conference on Compiler Construction. Paper presented at 25th International Conference on Compiler Construction, CC 2016, Barcelona, Spain, 17 March 2016 through 18 March 2016 (pp. 263-264). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Register allocation and instruction scheduling in Unison
2016 (English)In: Proceedings of CC 2016: The 25th International Conference on Compiler Construction, Association for Computing Machinery (ACM), 2016, p. 263-264Conference paper, Published paper (Refereed)
Abstract [en]

This paper describes Unison, a simple, flexible, and potentially optimal software tool that performs register allocation and instruction scheduling in integration using combinatorial optimization. The tool can be used as an alternative or as a complement to traditional approaches, which are fast but complex and suboptimal. Unison is most suitable whenever high-quality code is required and longer compilation times can be tolerated (such as in embedded systems or library releases), or the targeted processors are so irregular that traditional compilers fail to generate satisfactory code.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2016
Keywords
Combinatorial optimization, Instruction scheduling, Register allocation
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-183393 (URN)10.1145/2892208.2892237 (DOI)000389808800026 ()2-s2.0-84966560429 (Scopus ID)9781450342414 (ISBN)
Conference
25th International Conference on Compiler Construction, CC 2016, Barcelona, Spain, 17 March 2016 through 18 March 2016
Note

QC 20161122

Available from: 2016-03-09 Created: 2016-03-09 Last updated: 2024-03-15Bibliographically approved
Hjort Blindell, G., Menne, C. & Sander, I. (2016). Synthesizing Code for GPGPUs from abstract formal models. In: 16th Conference on Languages, Design Methods, and Tools for Electronic System Design, FDL 2014: . Paper presented at 14 October 2014 through 16 October 2014 (pp. 115-134). Springer
Open this publication in new window or tab >>Synthesizing Code for GPGPUs from abstract formal models
2016 (English)In: 16th Conference on Languages, Design Methods, and Tools for Electronic System Design, FDL 2014, Springer, 2016, p. 115-134Conference paper, Published paper (Refereed)
Abstract [en]

Today multiple frameworks exist for elevating the task of writing programs for GPGPUs, which are massively data-parallel execution platforms. These are needed as writing correct and high-performing applications for GPGPUs is notoriously difficult due to the intricacies of the underlying architecture. However, the existing frameworks lack a formal foundation that makes them difficult to use together with formal verification, testing, and design space exploration. We present in this chapter a novel software synthesis tool—called f2cc—which is capable of generating efficient GPGPU code from abstract formal models based on the synchronous model of computation. These models can be built using high-level modeling methodologies that hide low-level architecture details from the developer. The correctness of the tool has been experimentally validated on models derived from two applications. The experiments also demonstrate that the synthesized GPGPU code yielded a 28× speedup when executed on a graphics card with 96 cores and compared against a sequential version that uses only the CPU.

Place, publisher, year, edition, pages
Springer, 2016
Keywords
Architectural design, Codes (symbols), Computational linguistics, Program processors, Systems analysis, Design space exploration, Formal foundation, Graphics card, High-level modeling, High-performing applications, Software synthesis, Synchronous models, Writing projects, Design
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-181119 (URN)10.1007/978-3-319-24457-0_7 (DOI)2-s2.0-84952775263 (Scopus ID)
Conference
14 October 2014 through 16 October 2014
Note

Part of proceedings: ISBN 978-3-319-24455-6

Not duplicate with DiVA 758857

QC 20230206

Available from: 2016-02-09 Created: 2016-01-29 Last updated: 2023-02-06Bibliographically approved
Hjort Blindell, G., Castañeda Lozano, R., Carlsson, M. & Schulte, C. (2015). Modeling Universal Instruction Selection. In: Gilles Pesant (Ed.), Principles and Practice of Constraint Programming: 21st International Conference, CP 2015, Cork, Ireland, August 31 -- September 4, 2015, Proceedings. Paper presented at 21st International Conference on Principles and Practice of Constraint Programming,August 31-September 4, 2015, Ireland (pp. 609-626). Springer, 9255
Open this publication in new window or tab >>Modeling Universal Instruction Selection
2015 (English)In: Principles and Practice of Constraint Programming: 21st International Conference, CP 2015, Cork, Ireland, August 31 -- September 4, 2015, Proceedings / [ed] Gilles Pesant, Springer, 2015, Vol. 9255, p. 609-626Conference paper, Published paper (Refereed)
Abstract [en]

Instruction selection implements a program under compilation by selecting  processor instructions and has tremendous impact on the performance of the  code generated by a compiler. This paper introduces a graph-based  universal representation that unifies data and control flow for both  programs and processor instructions. The representation is the essential  prerequisite for a constraint model for instruction selection introduced in  this paper. The model is demonstrated to be expressive in that it supports many processor  features that are out of reach of state-of-the-art approaches, such as  advanced branching instructions, multiple register banks, and SIMD  instructions. The resulting model can be solved for small to medium size  input programs and sophisticated processor instructions and is competitive with LLVM in code quality. Model and representation are significant due to their  expressiveness and their potential to be combined with models for other code  generation tasks.

Place, publisher, year, edition, pages
Springer, 2015
Series
Lecture Notes in Computer Science ; 9255
Keywords
instruction selection, global code motion, code generation, optimizing compilers, program representations, modern processor architectures
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-169684 (URN)10.1007/978-3-319-23219-5_42 (DOI)000364707100042 ()2-s2.0-84944558675 (Scopus ID)978-3-319-23219-5 (ISBN)
Conference
21st International Conference on Principles and Practice of Constraint Programming,August 31-September 4, 2015, Ireland
Funder
Swedish Research Council, VR 621-2011-6229
Note

QC 20150828

Available from: 2015-06-22 Created: 2015-06-22 Last updated: 2022-06-23Bibliographically approved
Lozano, R. C., Carlsson, M., Hjort Blindell, G. & Schulte, C. (2014). Combinatorial Spill Code Optimization and Ultimate Coalescing. SIGPLAN notices, 49(5), 23-32
Open this publication in new window or tab >>Combinatorial Spill Code Optimization and Ultimate Coalescing
2014 (English)In: SIGPLAN notices, ISSN 0362-1340, E-ISSN 1558-1160, Vol. 49, no 5, p. 23-32Article in journal (Refereed) Published
Abstract [en]

This paper presents a novel combinatorial model that integrates global register allocation based on ultimate coalescing, spill code optimization, register packing, and multiple register banks with instruction scheduling (including VLIW). The model exploits alternative temporaries that hold the same value as a new concept for ultimate coalescing and spill code optimization. The paper presents Unison as a code generator based on the model and advanced solving techniques using constraint programming. Thorough experiments using MediaBench and a processor (Hexagon) that are typical for embedded systems demonstrate that Unison: is robust and scalable; generates faster code than LLVM (up to 4 1 % with a mean improvement of 7 %); possibly generates optimal code (for 2 9 % of the experiments); effortlessly supports different optimization criteria (code size on par with LLVM). Unison is significant as it addresses the same aspects as traditional code generation algorithms, yet is based on a simple integrated model and robustly can generate optimal code.

Keywords
spill code optimization, ultimate coalescing, combinatorial optimization, register allocation, instruction scheduling
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-154398 (URN)10.1145/2597809.2597815 (DOI)000341937800004 ()2-s2.0-84905660668 (Scopus ID)
Funder
Swedish Research Council, VR 621-2011-6229
Note

QC 20141021

Available from: 2014-10-21 Created: 2014-10-20 Last updated: 2022-06-23Bibliographically approved
Hjort Blindell, G., Menne, C. & Sander, I. (2014). Synthesizing Code for GPGPUs from Abstract Formal Models. In: Dr. Adam Morawiec and Jinnie Hinderscheit (Ed.), Forum on specification and Design Languages (FDL), Munich, Germany, October 14-16, 2014: . Paper presented at Forum on specification and Design Languages(FDL),October 14-16, 2014 Munich, Germany. IEEE conference proceedings
Open this publication in new window or tab >>Synthesizing Code for GPGPUs from Abstract Formal Models
2014 (English)In: Forum on specification and Design Languages (FDL), Munich, Germany, October 14-16, 2014 / [ed] Dr. Adam Morawiec and Jinnie Hinderscheit, IEEE conference proceedings, 2014Conference paper, Published paper (Refereed)
Abstract [en]

Today multiple frameworks exist for elevating thetask of writing programs for GPGPUs, which are massively data-parallel execution platforms. These are needed as writing correctand high-performing applications for GPGPUs is notoriouslydifficult due to the intricacies of the underlying architecture.However, the existing frameworks lack a formal foundation thatmakes them difficult to use together with formal verification,testing, and design space exploration. We present in this papera novel software synthesis tool – called f2cc – which is capableof generating efficient GPGPU code from abstract formal modelsbased on the synchronous model of computation. These modelscan be built using high-level modeling methodologies that hidelow-level architecture details from the developer. The correctnessof the tool has been experimentally validated on models derivedfrom two applications. The experiments also demonstrate that thesynthesized GPGPU code yielded a 28× speedup when executedon a graphics card with 96 cores and compared against asequential version that uses only the CPU.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2014
Keywords
Analytical models, computational modeling, system- level design, multicore processing
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-154842 (URN)10.1109/FDL.2014.7119363 (DOI)000412205600018 ()2-s2.0-84940498062 (Scopus ID)978-2-9530504-9-3 (ISBN)
Conference
Forum on specification and Design Languages(FDL),October 14-16, 2014 Munich, Germany
Note

QC 20141117

Available from: 2014-10-28 Created: 2014-10-28 Last updated: 2022-06-23Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-6794-6413

Search in DiVA

Show all publications