kth.sePublications
Change search
Refine search result
1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Geilhufe, R. Matthias
    et al.
    Borysov, Stanislav S.
    KTH, Centres, Nordic Institute for Theoretical Physics NORDITA.
    Kalpakchi, Dmytro
    KTH, Centres, Nordic Institute for Theoretical Physics NORDITA.
    Balatsky, Alexander V.
    KTH, Centres, Nordic Institute for Theoretical Physics NORDITA.
    Towards novel organic high-T-c superconductors: Data mining using density of states similarity search2018In: Physical Review Materials, E-ISSN 2475-9953, Vol. 2, no 2, article id 024802Article in journal (Refereed)
    Abstract [en]

    Identifying novel functional materials with desired key properties is an important part of bridging the gap between fundamental research and technological advancement. In this context, high-throughput calculations combinedwith data-mining techniques highly accelerated this process in different areas of research during the past years. The strength of a data-driven approach for materials prediction lies in narrowing down the search space of thousands of materials to a subset of prospective candidates. Recently, the open-access organic materials database OMDBwas released providing electronic structure data for thousands of previously synthesized three-dimensional organic crystals. Based on the OMDB, we report about the implementation of a novel density of states similarity search tool which is capable of retrieving materials with similar density of states to a reference material. The tool is based on the approximate nearest neighbor algorithm as implemented in the ANNOY library and can be applied via the OMDB web interface. The approach presented here is wide ranging and can be applied to various problems where the density of states is responsible for certain key properties of a material. As the first application, we report about materials exhibiting electronic structure similarities to the aromatic hydrocarbon p-terphenyl which was recently discussed as a potential organic high-temperature superconductor exhibiting a transition temperature in the order of 120 K under strong potassium doping. Although the mechanism driving the remarkable transition temperature remains under debate, we argue that the density of states, reflecting the electronic structure of a material, might serve as a crucial ingredient for the observed high T-c. To provide candidates which might exhibit comparable properties, we present 15 purely organic materials with similar features to p-terphenyl within the electronic structure, which also tend to have structural similarities with p-terphenyl such as space group symmetries, chemical composition, and molecular structure. The experimental verification of these candidates might lead to a better understanding of the underlying mechanism in case similar superconducting properties are revealed.

  • 2.
    Kalpakchi, Dmytro
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Ask and distract: Data-driven methods for the automatic generation of multiple-choice reading comprehension questions from Swedish texts2023Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Multiple choice questions (MCQs) are widely used for summative assessment in many different subjects. The tasks in this format are particularly appealing because they can be graded swiftly and automatically. However, the process of creating MCQs is far from swift or automatic and requires a lot of expertise both in the specific subject and also in test construction.

    This thesis focuses on exploring methods for the automatic MCQ generation for assessing the reading comprehension abilities of second-language learners of Swedish. We lay the foundations for the MCQ generation research for Swedish by collecting two datasets of reading comprehension MCQs, and designing and developing methods for generating the whole MCQs or its parts. An important contribution is the methods (which were designed and applied in practice) for the automatic and human evaluation of the generated MCQs.

    The best currently available method (as of June 2023) for generating MCQs for assessing reading comprehension in Swedish is ChatGPT (although still only around 60% of generated MCQs were judged acceptable). However, ChatGPT is neither open-source, nor free. The best open-source and free-to-use method is the fine-tuned version of SweCTRL-Mini, a foundational model developed as a part of this thesis. Nevertheless, all explored methods are far from being useful in practice but the reported results provide a good starting point for future research.

    Download full text (pdf)
    Summary
  • 3.
    Kalpakchi, Dmytro
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    SweCTRL-Mini: a data-transparent Transformer-based large language model for controllable text generation in SwedishManuscript (preprint) (Other academic)
    Abstract [en]

    We present SweCTRL-Mini, a large Swedish language model that can be used for inference and fine-tuning on a single consumer-grade GPU. The model is based on the CTRL architecture by Keskar et.al. (2019), which means that users of the SweCTRL-Mini model can control the genre of the generated text by inserting special tokens in the generation prompts. SweCTRL-Mini is trained on a subset of the Swedish part of the mC4 corpus and a set of Swedish novels. In this article, we provide (1) a detailed account of the utilized training data and text pre-processing steps, to the extent that it is possible to check whether a specific phrase/source was a part of the training data, and (2) an evaluation of the model on both discriminative tasks, using automatic evaluation methods, and generative tasks, using human referees. We also compare the generative capabilities of the model with those of GPT-3. SweCTRL-Mini is fully open and available for download.

  • 4.
    Kalpakchi, Dmytro
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Boye, Johan
    Automatically generating question-answer pairs for assessing basic reading comprehension in Swedish2022Conference paper (Refereed)
    Abstract [en]

    This paper presents an evaluation of the quality of automatically generated reading comprehension questions from Swedish text, using the Quinductor method. This method is a light-weight, data-driven but non-neural method for automatic question generation (QG). The evaluation shows that Quinductor is a viable QG method that can provide a strong baseline for neural-network-based QG methods. 

    Download full text (pdf)
    fulltext
  • 5.
    Kalpakchi, Dmytro
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Boye, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset2021In: Proceedings of the 14th International Conference on Natural Language Generation, 2021, p. 387-403Conference paper (Refereed)
    Abstract [en]

    An important part when constructing multiple-choice questions (MCQs) for reading comprehension assessment are the distractors, the incorrect but preferably plausible answer options. In this paper, we present a new BERT-based method for automatically generating distractors using only a small-scale dataset. We also release a new such dataset of Swedish MCQs (used for training the model), and propose a methodology for assessing the generated distractors. Evaluation shows that from a student's perspective, our method generated one or more plausible distractors for more than 50% of the MCQs in our test set. From a teacher's perspective, about 50% of the generated distractors were deemed appropriate. We also do a thorough analysis of the results

    Download full text (pdf)
    fulltext
  • 6.
    Kalpakchi, Dmytro
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Boye, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Generation and Evaluation of Multiple-choice Reading Comprehension Questions for SwedishManuscript (preprint) (Other academic)
    Abstract [en]

    Multiple-choice questions (MCQs) provide a widely used means of assessing reading comprehension. The automatic generation of such MCQs is a challenging language-technological problem that also has interesting educational applications. This article presents several methods for automatically producing reading comprehension questions MCQs from Swedish text. Unlike previous approaches, we construct models to generate the whole MCQ in one go, rather than using a pipeline architecture. Furthermore, we propose a two-stage method for evaluating the quality of the generated MCQs, first evaluating on carefully designed single-sentence texts, and then on texts from the SFI national exams. An extensive evaluation of the MCQ-generating capabilities of 12 different models, using this two-stage scheme, reveals that GPT-based models surpass smaller models that have been fine-tuned using small-scale datasets on this specific problem.

    Download full text (pdf)
    fulltext
  • 7.
    Kalpakchi, Dmytro
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Boye, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Minor changes make a difference: a case study on the consistency of UD-based dependency parsers2021In: Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021), Association for Computational Linguistics (ACL) , 2021, p. 96-108Conference paper (Refereed)
    Abstract [en]

    Many downstream applications are using dependency trees, and are thus relying on dependencyparsers producing correct, or at least consistent, output. However, dependency parsers are trainedusing machine learning, and are therefore susceptible to unwanted inconsistencies due to biasesin the training data. This paper explores the effects of such biases in four languages – English,Swedish, Russian, and Ukrainian – though an experiment where we study the effect of replacingnumerals in sentences. We show that such seemingly insignificant changes in the input can causelarge differences in the output, and suggest that data augmentation can remedy the problems.

    Download full text (pdf)
    fulltext
  • 8.
    Kalpakchi, Dmytro
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Boye, Johan
    Quasi: a synthetic Question-Answering dataset in Swedish using GPT-3 and zero-shot learning2023In: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), 2023, p. 477-491Conference paper (Refereed)
    Abstract [en]

    This paper describes the creation and evaluation of a synthetic dataset of Swedish multiple-choice questions (MCQs) for reading comprehension using GPT-3. Although GPT-3 is trained mostly on English data, with only 0.11% of Swedish texts in its training material, the model still managed to generate MCQs in Swedish. About 44% of the generated MCQs turned out to be of sufficient quality, i.e.\ they were grammatically correct and relevant, with exactly one answer alternative being correct and the others being plausible but wrong. We provide a detailed analysis of the errors and shortcomings of the rejected MCQs, as well an analysis of the level of difficulty of the accepted MCQs. In addition to giving insights into GPT-3, the synthetic dataset could be used for training and evaluation of special-purpose MCQ-generating models.

    Download full text (pdf)
    fulltext
  • 9.
    Kalpakchi, Dmytro
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Boye, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Quinductor: A multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies2024In: Natural Language Engineering, ISSN 1351-3249, E-ISSN 1469-8110, p. 217-255Article in journal (Refereed)
    Abstract [en]

    We propose a multilingual data-driven method for generating reading comprehension questions using dependency trees. Our method provides a strong, deterministic and inexpensive-to-train baseline for less-resourced languages. While a language-specific corpus is still required, its size is nowhere near those required by modern neural question generation (QG) architectures. Our method surpasses QG baselines previously reported in the literature in terms of automatic evaluation metrics and shows a good performance in terms of human evaluation.

    Download full text (pdf)
    fulltext
  • 10.
    Kalpakchi, Dmytro
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Boye, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    SpaceRefNet: a neural approach to spatial reference resolution in a real city environment2019In: Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, Association for Computational Linguistics , 2019, p. 422-431Conference paper (Refereed)
    Abstract [en]

    Adding interactive capabilities to pedestrian wayfinding systems in the form of spoken dialogue will make them more natural to humans. Such an interactive wayfinding system needs to continuously understand and interpret pedestrian’s utterances referring to the spatial context. Achieving this requires the system to identify exophoric referring expressions in the utterances, and link these expressions to the geographic entities in the vicinity. This exophoric spatial reference resolution problem is difficult, as there are often several dozens of candidate referents. We present a neural network-based approach for identifying pedestrian’s references (using a network called RefNet) and resolving them to appropriate geographic objects (using a network called SpaceRefNet). Both methods show promising results beating the respective baselines and earlier reported results in the literature.

    Download full text (pdf)
    fulltext
  • 11.
    Kalpakchi, Dmytro
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Boye, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Textinator: an Internationalized Tool for Annotation and Human Evaluation in Natural Language Processing and Generation2022In: LREC 2022: Thirteen International Conference On Language Resources And Evaluation / [ed] Calzolari, N Bechet, F Blache, P Choukri, K Cieri, C Declerck, T Goggi, S Isahara, H Maegaard, B Mazo, H Odijk, H Piperidis, S, European Language Resources Association (ELRA) , 2022, p. 856-866Conference paper (Refereed)
    Abstract [en]

    We release an internationalized annotation and human evaluation bundle, called Textinator, along with documentation and video tutorials. Textinator allows annotating data for a wide variety of NLP tasks, and its user interface is offered in multiple languages, lowering the entry threshold for domain experts. The latter is, in fact, quite a rare feature among the annotation tools, that allows controlling for possible unintended biases introduced due to hiring only English-speaking annotators. We illustrate the rarity of this feature by presenting a thorough systematic comparison of Textinator to previously published annotation tools along 9 different axes (with internationalization being one of them). To encourage researchers to design their human evaluation before starting to annotate data, Textinator offers an easy-to-use tool for human evaluations allowing importing surveys with potentially hundreds of evaluation items in one click. We finish by presenting several use cases of annotation and evaluation projects conducted using pre-release versions of Textinator. The presented use cases do not represent Textinator's full annotation or evaluation capabilities, and interested readers are referred to the online documentation for more information.

  • 12.
    Kalpakchi, Dmytro
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Boye, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    UDon2: a library for manipulating Universal Dependencies trees2020In: Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020), 2020, p. 120-125Conference paper (Refereed)
    Abstract [en]

    UDon2 is an open-source library for manipulating dependency trees represented in the CoNLL-U format. The library is compatible with the Universal Dependencies. UDon2 is aimed at developers of downstream Natural Language Processing applications that require manipulating dependency trees on the sentence level (in addition to other available tools geared towards working with treebanks).

    Download full text (pdf)
    fulltext
  • 13.
    Willemsen, Bram
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Kalpakchi, Dmytro
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Skantze, Gabriel
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Collecting Visually-Grounded Dialogue with A Game Of Sorts2022In: Proceedings of the 13th Conference on Language Resources and Evaluation / [ed] Calzolari, N Bechet, F Blache, P Choukri, K Cieri, C Declerck, T Goggi, S Isahara, H Maegaard, B Mazo, H Odijk, H Piperidis, S, European Language Resources Association (ELRA) , 2022, p. 2257-2268Conference paper (Refereed)
    Abstract [en]

    An idealized, though simplistic, view of the referring expression production and grounding process in (situated) dialogue assumes that a speaker must merely appropriately specify their expression so that the target referent may be successfully identified by the addressee. However, referring in conversation is a collaborative process that cannot be aptly characterized as an exchange of minimally-specified referring expressions. Concerns have been raised regarding assumptions made by prior work on visually-grounded dialogue that reveal an oversimplified view of conversation and the referential process. We address these concerns by introducing a collaborative image ranking task, a grounded agreement game we call “A Game Of Sorts”. In our game, players are tasked with reaching agreement on how to rank a set of images given some sorting criterion through a largely unrestricted, role-symmetric dialogue. By putting emphasis on the argumentation in this mixed-initiative interaction, we collect discussions that involve the collaborative referential process. We describe results of a small-scale data collection experiment with the proposed task. All discussed materials, which includes the collected data, the codebase, and a containerized version of the application, are publicly available.

    Download full text (pdf)
    fulltext
1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf