kth.sePublications
12345671 of 20
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Abstraction, Composition, and Resolvable Ambiguity in Programming Language Implementation
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0003-0669-4085
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Implementing a programming language is a significant undertaking. This is especially problematic for domain-specific languages, whose limited general applicability skews the effort-to-benefit ratio further. On the other hand, a domain-specific language can be very beneficial indeed for a problem in its intended domain. To realize this potential we require more effective implementation approaches. Of course, the technical requirements of a domain-specific language overlap significantly with those of a general purpose language, thus advances for one tend to benefit the other as well.

Programming in general tends to increase productivity through abstraction, various means of either relegating implementation details to more limited portions of a program or omitting them altogether. This dissertation explores abstraction in three areas: reusing work between language implementations, ambiguity in programming language syntax, and automatic representation selection for high-level data structures.

In the first area we provide syncons, short for syntactical construct, where each syncon is a user-defined language construct that includes syntax, binding semantics, and run-time semantics. The run-time semantics are given in terms of macro expansion to another language, which in turn is also defined using syncons. A language is constructed by defining whatever new syncons are required and reusing others from previous languages.

In the second area we formally define resolvable ambiguity, an ambiguity that can be resolved by a programmer by editing an ambiguous program. The class of resolvable languages is larger than the class of unambiguous languages, which gives greater design freedom. For example, a designer need not make arbitrary choices in their grammar to satisfy the needs of most modern parsing methods. This thesis focuses on three cases for resolvable ambiguity: syntactic composition, intuitive grammars, and new language constructs for which no syntactic conventions yet exist. We provide two approaches using resolvable ambiguity; one extending context-free grammars and using its own parser, and one meant for embedding in a conventional parser. The former is more expressive, while the latter admits a proof that all ambiguities are resolvable, which we provide, mechanized in Coq. Both can automatically compute all resolutions of an ambiguity.

In the third area we present an approach where new abstract types, representations, operations, and operation implementations can be introduced by a user in an extensible fashion, e.g., new libraries can add representations, operations, and operation implementations to previously defined abstract types. We also support representation switching, where an abstract type can be represented differently throughout lifetime. We complement the design with a set of solvers prioritizing optimality or quick analysis, and also support transferring a previous solution, even in the presence of minor program changes.

Overall, these contributions allow greater control over the level of abstraction in programming languages as well as their implementations, raising it where beneficial, and enforcing greater specificity where desirable.

Abstract [sv]

Att implementera ett programmeringsspråk är en omfattande utmaning. Detta är särskilt relevant för domänspecifika språk, där implementationsarbetet är än mer disproportionerligt gentemot den allmänna nyttan. Å andra sidan kan ett domänspecifikt språk ge stora produktivitetsvinster inom dess avsedda domän. För att uppnå sådan potential med en rimlig mängd arbete krävs bättre implementationsmetoder. Som väl är så tenderar implementationstekniska framsteg för domänspecifika språk även gynna generella programmeringsspråk och vice versa; denna avhandling är således relevant för båda.

Allmänt tenderar programmering att åstadkomma ökad produktivitet via abstraktion; något som avgränsar implementationsdetaljer till någon liten del av ett program eller utelämnar dem helt och hållet. Denna avhandling arbetar med abstraktioner inom tre områden: återanvändning av implementationsarbete mellan flera programspråksimplementationer, tvetydighet i programspråkssyntax, och automatisering av implementationsval för datastrukturer.

Inom det första området introducerar vi syncons, små användardefinierade språkkonstruktioner som beskriver syntax, namnbindningssemantik, och körsemantik. Den sistnämnda ges i form av makroexpansion till ett annat språk, som även det definieras m.h.a syncons. Ett språk konstrueras sedan genom att återanvända syncons från tidigare språk och konstruera nya i mån av behov.

Inom det andra området ger vi en formell definition av lösbar tvetydighet, vilket innebär att en programmerare kan skriva om ett tvetydigt program till en entydig form. Klassen språk där alla tvetydigheter är lösbara är strikt större än klassen entydiga språk, vilket ökar designfrihet; en språkdesigner behöver inte göra godtyckliga val för att åstadkomma en entydig grammatik, vilket typiskt sett krävs av moderna parsningsmetoder. Detta är användbart i ett flertal situationer; i denna avhandling fokuserar vi på tre fall: syntaktisk sammansättning, intuitiva grammatiker, och nya språkkonstruktioner som saknar allmänt kända konventioner. Vi tillhandahåller två metoder som stödjer lösbar tvetydighet: en som utökar kontextfria grammatiker och använder sig egen parser, och en som kan användas i en mer konventionell parser. Den förstnämnda är mer uttrycksfull, men språk definierade i den andra innehåller enbart lösbara tvetydigheter, vilket vi bevisar i Coq. Båda tillhandahåller en automatisk metod som finner alla entydiga former av tvetydiga program.

Inom det tredje området tillhandahåller vi en design där en användare kan definiera nya abstrakta typer, konkreta representationer, abstrakta operationer, och konkreta operationsimplementationer på ett utbyggbart sätt. Exempelvis kan nya bibliotek lägga till representationer, operationer, och operationsimplementationer till tidigare definierade abstrakta typer. Vi stödjer också representationsbyten, där en abstrakt typ kan använda olika representationer i olika delar av ett program. Vi tillhandahåller också en uppsättning automatiska lösare som väljer implementationer, antingen genom att prioritera optimalitet eller snabb analysis, plus en möjlighet att återanvända en tidigare lösning, även om programet ändrats något sedan dess.

Sammantaget bidrar denna avhandling till att ge större kontroll över abstraktionsnivåer i programmeringsspråk och deras implementationer, dels genom att höja abstraktionsnivån där det är fördelaktigt, men också genom att kräva detaljer och specificitet där det är önskvärt.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. , p. iii, 66
Series
TRITA-EECS-AVL ; 2024:69
Keywords [en]
Domain-specific languages, Programming languages, Syntax, Ambiguity, Data structures
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-353270ISBN: 978-91-8106-044-7 (print)OAI: oai:DiVA.org:kth-353270DiVA, id: diva2:1897734
Public defence
2024-10-08, https://kth-se.zoom.us/j/66561560361, Ka-Sal A, Kistagången 16, Stockholm, 13:00 (English)
Opponent
Supervisors
Funder
KTH Royal Institute of Technology, Digital FuturesVinnova, TECoSASwedish Research Council, 2018-04329Swedish Foundation for Strategic Research, FFL15-0032
Note

QC 20240916

Available from: 2024-09-16 Created: 2024-09-14 Last updated: 2024-09-16Bibliographically approved
List of papers
1. Creating domain-specific languages by composing syntactical constructs
Open this publication in new window or tab >>Creating domain-specific languages by composing syntactical constructs
2019 (English)In: 21st International Symposium on Practical Aspects of Declarative Languages, PADL 2019, Springer, 2019, Vol. 11372, p. 187-203Conference paper, Published paper (Refereed)
Abstract [en]

Creating a programming language is a considerable undertaking, even for relatively small domain-specific languages (DSLs). Most approaches to ease this task either limit the flexibility of the DSL or consider entire languages as the unit of composition. This paper presents a new approach using syntactical constructs (also called syncons) for defining DSLs in much smaller units of composition while retaining flexibility. A syntactical construct defines a single language feature, such as an if statement or an anonymous function. Each syntactical construct is fully self-contained: it specifies its own concrete syntax, binding semantics, and runtime semantics, independently of the rest of the language. The runtime semantics are specified as a translation to a user defined target language, while the binding semantics allow name resolution before expansion. Additionally, we present a novel approach for dealing with syntactical ambiguity that arises when combining languages, even if the languages are individually unambiguous. The work is implemented and evaluated in a case study, where small subsets of OCaml and Lua have been defined and composed using syntactical constructs.

Place, publisher, year, edition, pages
Springer, 2019
Series
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISSN 0302-9743 ; 11372
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-241375 (URN)10.1007/978-3-030-05998-9_12 (DOI)000704024700012 ()2-s2.0-85059674904 (Scopus ID)9783030059972 (ISBN)
Conference
21st International Symposium on Practical Aspects of Declarative Languages, PADL 2019, Lisbon, Portugal, 14 January 2019 through 15 January 201
Note

QC 20190121

Available from: 2019-01-21 Created: 2019-01-21 Last updated: 2024-09-14Bibliographically approved
2. Resolvable ambiguity: Principled resolution of syntactically ambiguous programs
Open this publication in new window or tab >>Resolvable ambiguity: Principled resolution of syntactically ambiguous programs
2021 (English)In: CC 2021: Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction, New York, NY, United States, 2021Conference paper, Published paper (Refereed)
Abstract [en]

When building a new programming language, it can be useful to compose parts of existing languages to avoid repeating implementation work. However, this is problematic already at the syntax level, as composing the grammars of language fragments can easily lead to an ambiguous grammar. State-of-the-art parser tools cannot handle ambiguity truly well: either the grammar cannot be handled at all, or the tools give little help to an end-user who writes an ambiguous program. This composability problem is twofold: (i) how can we detect if the composed grammar is ambiguous, and (ii) if it is ambiguous, how can we help a user resolve an ambiguous program? In this paper, we depart from the traditional view of unambiguous grammar design and enable a language designer to work with an ambiguous grammar, while giving users the tools needed to handle these ambiguities. We introduce the concept of resolvable ambiguity wherein a user can resolve an ambiguous program by editing it, as well as an approach to computing the resolutions of an ambiguous program. Furthermore, we present a method based on property-based testing to identify if a composed grammar is unambiguous, resolvably ambiguous, or unresolvably ambiguous. The method is implemented in Haskell and evaluated on a large set of language fragments selected from different languages. The evaluation shows that (i) the approach can handle significantly more cases of language compositions compared to approaches which ban ambiguity altogether, and (ii) that the approach is fast enough to be used in practice.

Place, publisher, year, edition, pages
New York, NY, United States: , 2021
Keywords
Syntax, Ambiguity
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-295417 (URN)10.1145/3446804.3446846 (DOI)2-s2.0-85102312948 (Scopus ID)
Conference
CC '21: 30th ACM SIGPLAN International Conference on Compiler Construction, Online, South Korea, 2-3 March 2021
Funder
Swedish Foundation for Strategic Research, FFL15-0032Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

Part of proceedings: ISBN 9781450383257, QC 20230117

Available from: 2021-05-20 Created: 2021-05-20 Last updated: 2024-09-14Bibliographically approved
3. Statically Resolvable Ambiguity
Open this publication in new window or tab >>Statically Resolvable Ambiguity
2023 (English)In: Proceedings of the ACM on Programming Languages, E-ISSN 2475-1421, Vol. 7, no POPLArticle in journal (Refereed) Published
Abstract [en]

Traditionally, a grammar defining the syntax of a programming language is typically both context free and unambiguous. However, recent work suggests that an attractive alternative is to use ambiguous grammars, thus postponing the task of resolving the ambiguity to the end user. If all programs accepted by an ambiguous grammar can be rewritten unambiguously, then the parser for the grammar is said to be resolvably ambiguous. Guaranteeing resolvable ambiguity statically-for all programs-is hard, where previous work only solves it partially using techniques based on property-based testing. In this paper, we present the first efficient, practical, and proven correct solution to the statically resolvable ambiguity problem. Our approach introduces several key ideas, including splittable productions, operator sequences, and the concept of a grouper that works in tandem with a standard parser. We prove static resolvability using a Coq mechanization and demonstrate its efficiency and practical applicability by implementing and integrating resolvable ambiguity into an essential part of the standard OCaml parser.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
Parser, Resolvable Ambiguity, Coq, OCaml
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-324344 (URN)10.1145/3571251 (DOI)000910847500058 ()2-s2.0-85146580328 (Scopus ID)
Note

QC 20230228

Available from: 2023-02-28 Created: 2023-02-28 Last updated: 2024-09-14Bibliographically approved
4. Repr Types: One Abstraction to Rule Them All
Open this publication in new window or tab >>Repr Types: One Abstraction to Rule Them All
(English)Manuscript (preprint) (Other academic)
Abstract [en]

The choice of how to represent an abstract type can have a major impact on the performance of a program, yet mainstream compilers cannot perform optimizations at such a high level. When dealing with optimizations of data type representations, an important feature is having extensible representation-flexible data types; the ability for a programmer to add new abstract types and operations, as well as concrete implementations of these, without modifying the compiler or a previously defined library. Many research projects support high-level optimizations through static analysis, instrumentation, or benchmarking, but they are all restricted in at least one aspect of extensibility.

This paper presents a new approach to representation-flexible data types without such restrictions and which still finds efficient optimizations. Our approach centers around a single built-in type repr and function overloading with cost annotations for operation implementations. We evaluate our approach (i) by defining a universal collection type as a library, a single type for all conventional collections, and (ii) by designing and implementing a representation-flexible graph library. Programs using repr types are typically faster than programs with idiomatic representation choices - sometimes dramatically so - as long as the compiler finds good implementations for all operations. Our compiler performs the analysis efficiently by finding optimized solutions quickly and by reusing previous results to avoid recomputations.

National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-353104 (URN)10.48550/arXiv.2409.07950 (DOI)
Funder
Vinnova, TECoSASwedish Research Council, 2020-05346Swedish Research Council, 2023-05526Swedish Research Council, 2018-04329KTH Royal Institute of Technology, Digital Futures
Note

QC 20240920

Available from: 2024-09-14 Created: 2024-09-14 Last updated: 2024-09-20Bibliographically approved

Open Access in DiVA

Palmkvist_Thesis(1772 kB)29 downloads
File information
File name FULLTEXT03.pdfFile size 1772 kBChecksum SHA-512
cc37061f74a39fb150cfdac0f317e504316c186d7a219aa03df4eb2a4f6455b200111d782c6a3ba9fa2add5d378a241e42a6399282857ce4dea2179e97d7d927
Type fulltextMimetype application/pdf

Authority records

Palmkvist, Viktor

Search in DiVA

By author/editor
Palmkvist, Viktor
By organisation
Software and Computer systems, SCS
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 29 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 100 hits
12345671 of 20
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf