kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 38) Show all publications
Misra, S. & Boye, J. (2024). Nested Noun Phrase Identification using BERT. In: 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings: . Paper presented at Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024, Hybrid, Torino, Italy, May 20 2024 - May 25 2024 (pp. 12138-12143). European Language Resources Association (ELRA)
Open this publication in new window or tab >>Nested Noun Phrase Identification using BERT
2024 (English)In: 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, European Language Resources Association (ELRA) , 2024, p. 12138-12143Conference paper, Published paper (Refereed)
Abstract [en]

For several NLP tasks, an important substep is the identification of noun phrases in running text. This has typically been done by “chunking” - a way of finding minimal noun phrases by token classification. However, chunking-like methods do not represent the fact that noun phrases can be nested. This paper presents a novel method of finding all noun phrases in a sentence, nested to an arbitrary depth, using the BERT model for token classification. We show that our proposed method achieves very good results for both Swedish and English.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA), 2024
Keywords
BERT, chunking, language models, nested phrases, noun phrase, partial parsing
National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-348778 (URN)2-s2.0-85195990394 (Scopus ID)
Conference
Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024, Hybrid, Torino, Italy, May 20 2024 - May 25 2024
Note

Part of ISBN 9782493814104

QC 20240701

Available from: 2024-06-27 Created: 2024-06-27 Last updated: 2025-02-07Bibliographically approved
Kalpakchi, D. & Boye, J. (2024). Quinductor: A multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies. Natural Language Engineering, 30(2), 217-255
Open this publication in new window or tab >>Quinductor: A multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies
2024 (English)In: Natural Language Engineering, ISSN 1351-3249, E-ISSN 1469-8110, Vol. 30, no 2, p. 217-255Article in journal (Refereed) Published
Abstract [en]

We propose a multilingual data-driven method for generating reading comprehension questions using dependency trees. Our method provides a strong, deterministic and inexpensive-to-train baseline for less-resourced languages. While a language-specific corpus is still required, its size is nowhere near those required by modern neural question generation (QG) architectures. Our method surpasses QG baselines previously reported in the literature in terms of automatic evaluation metrics and shows a good performance in terms of human evaluation.

Place, publisher, year, edition, pages
Cambridge University Press (CUP), 2024
Keywords
Natural language generation, Evaluation, Multilinguality, Question generation, Reading comprehension
National Category
Natural Language Processing
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-326862 (URN)10.1017/s1351324923000037 (DOI)000939777300001 ()2-s2.0-85189534486 (Scopus ID)
Funder
Vinnova, 2019-02997
Note

QC 20230515

Available from: 2023-05-15 Created: 2023-05-15 Last updated: 2025-02-25Bibliographically approved
Kalpakchi, D. & Boye, J. (2023). Quasi: a synthetic Question-Answering dataset in Swedish using GPT-3 and zero-shot learning. In: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa): . Paper presented at The 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023), 22-24 May 2023, Tórshavn, Faroe Islands (pp. 477-491).
Open this publication in new window or tab >>Quasi: a synthetic Question-Answering dataset in Swedish using GPT-3 and zero-shot learning
2023 (English)In: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), 2023, p. 477-491Conference paper, Published paper (Refereed)
Abstract [en]

This paper describes the creation and evaluation of a synthetic dataset of Swedish multiple-choice questions (MCQs) for reading comprehension using GPT-3. Although GPT-3 is trained mostly on English data, with only 0.11% of Swedish texts in its training material, the model still managed to generate MCQs in Swedish. About 44% of the generated MCQs turned out to be of sufficient quality, i.e.\ they were grammatically correct and relevant, with exactly one answer alternative being correct and the others being plausible but wrong. We provide a detailed analysis of the errors and shortcomings of the rejected MCQs, as well an analysis of the level of difficulty of the accepted MCQs. In addition to giving insights into GPT-3, the synthetic dataset could be used for training and evaluation of special-purpose MCQ-generating models.

National Category
Natural Language Processing
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-327972 (URN)
Conference
The 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023), 22-24 May 2023, Tórshavn, Faroe Islands
Note

QC 20230602

Available from: 2023-06-02 Created: 2023-06-02 Last updated: 2025-02-07Bibliographically approved
Kalpakchi, D. & Boye, J. (2022). Automatically generating question-answer pairs for assessing basic reading comprehension in Swedish. In: : . Paper presented at The 9th Swedish Language Technology Conference (SLTC 2022), Stockholm, Sweden, 23–25 November 2022.
Open this publication in new window or tab >>Automatically generating question-answer pairs for assessing basic reading comprehension in Swedish
2022 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents an evaluation of the quality of automatically generated reading comprehension questions from Swedish text, using the Quinductor method. This method is a light-weight, data-driven but non-neural method for automatic question generation (QG). The evaluation shows that Quinductor is a viable QG method that can provide a strong baseline for neural-network-based QG methods. 

National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-326889 (URN)10.48550/arXiv.2211.15568 (DOI)
Conference
The 9th Swedish Language Technology Conference (SLTC 2022), Stockholm, Sweden, 23–25 November 2022
Funder
Vinnova, 2019-02997
Note

QC 20230515

Available from: 2023-05-15 Created: 2023-05-15 Last updated: 2025-02-07Bibliographically approved
Kalpakchi, D. & Boye, J. (2022). Textinator: an Internationalized Tool for Annotation and Human Evaluation in Natural Language Processing and Generation. In: Calzolari, N Bechet, F Blache, P Choukri, K Cieri, C Declerck, T Goggi, S Isahara, H Maegaard, B Mazo, H Odijk, H Piperidis, S (Ed.), LREC 2022: Thirteen International Conference On Language Resources And Evaluation. Paper presented at 13th International Conference on Language Resources and Evaluation (LREC), JUN 20-25, 2022, Marseille, FRANCE (pp. 856-866). European Language Resources Association (ELRA)
Open this publication in new window or tab >>Textinator: an Internationalized Tool for Annotation and Human Evaluation in Natural Language Processing and Generation
2022 (English)In: LREC 2022: Thirteen International Conference On Language Resources And Evaluation / [ed] Calzolari, N Bechet, F Blache, P Choukri, K Cieri, C Declerck, T Goggi, S Isahara, H Maegaard, B Mazo, H Odijk, H Piperidis, S, European Language Resources Association (ELRA) , 2022, p. 856-866Conference paper, Published paper (Refereed)
Abstract [en]

We release an internationalized annotation and human evaluation bundle, called Textinator, along with documentation and video tutorials. Textinator allows annotating data for a wide variety of NLP tasks, and its user interface is offered in multiple languages, lowering the entry threshold for domain experts. The latter is, in fact, quite a rare feature among the annotation tools, that allows controlling for possible unintended biases introduced due to hiring only English-speaking annotators. We illustrate the rarity of this feature by presenting a thorough systematic comparison of Textinator to previously published annotation tools along 9 different axes (with internationalization being one of them). To encourage researchers to design their human evaluation before starting to annotate data, Textinator offers an easy-to-use tool for human evaluations allowing importing surveys with potentially hundreds of evaluation items in one click. We finish by presenting several use cases of annotation and evaluation projects conducted using pre-release versions of Textinator. The presented use cases do not represent Textinator's full annotation or evaluation capabilities, and interested readers are referred to the online documentation for more information.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA), 2022
Keywords
annotation tool, human evaluation tool, natural language processing, natural language generation
National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-324335 (URN)10.5281/zenodo.6497334 (DOI)000889371700090 ()2-s2.0-85144462359 (Scopus ID)
Conference
13th International Conference on Language Resources and Evaluation (LREC), JUN 20-25, 2022, Marseille, FRANCE
Note

QC 20230228

Available from: 2023-02-28 Created: 2023-02-28 Last updated: 2025-02-07Bibliographically approved
Kalpakchi, D. & Boye, J. (2021). BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset. In: Proceedings of the 14th International Conference on Natural Language Generation: . Paper presented at 14th International Conference on Natural Language Generation, INLG 2021, Virtual/Online, 20-24 September 2021 (pp. 387-403).
Open this publication in new window or tab >>BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset
2021 (English)In: Proceedings of the 14th International Conference on Natural Language Generation, 2021, p. 387-403Conference paper, Published paper (Refereed)
Abstract [en]

An important part when constructing multiple-choice questions (MCQs) for reading comprehension assessment are the distractors, the incorrect but preferably plausible answer options. In this paper, we present a new BERT-based method for automatically generating distractors using only a small-scale dataset. We also release a new such dataset of Swedish MCQs (used for training the model), and propose a methodology for assessing the generated distractors. Evaluation shows that from a student's perspective, our method generated one or more plausible distractors for more than 50% of the MCQs in our test set. From a teacher's perspective, about 50% of the generated distractors were deemed appropriate. We also do a thorough analysis of the results

Keywords
Multiple-choice questions, Reading comprehension, Small scale, Student perspectives, Swedishs, Teachers', Test sets
National Category
Natural Language Processing
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-302480 (URN)2-s2.0-85123291566 (Scopus ID)
Conference
14th International Conference on Natural Language Generation, INLG 2021, Virtual/Online, 20-24 September 2021
Funder
Vinnova, 2019-02997
Note

Part of proceedings: ISBN 978-1-954085-51-0

QC 20220301

Available from: 2021-09-24 Created: 2021-09-24 Last updated: 2025-02-07Bibliographically approved
Kalpakchi, D. & Boye, J. (2021). Minor changes make a difference: a case study on the consistency of UD-based dependency parsers. In: Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021): . Paper presented at UDW 2021 - 5th Workshop on Universal Dependencies, Proceedings - To be held as part of SyntaxFest 2021, Sofia, 21-25 March 2021 (pp. 96-108). Association for Computational Linguistics (ACL)
Open this publication in new window or tab >>Minor changes make a difference: a case study on the consistency of UD-based dependency parsers
2021 (English)In: Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021), Association for Computational Linguistics (ACL) , 2021, p. 96-108Conference paper, Published paper (Refereed)
Abstract [en]

Many downstream applications are using dependency trees, and are thus relying on dependencyparsers producing correct, or at least consistent, output. However, dependency parsers are trainedusing machine learning, and are therefore susceptible to unwanted inconsistencies due to biasesin the training data. This paper explores the effects of such biases in four languages – English,Swedish, Russian, and Ukrainian – though an experiment where we study the effect of replacingnumerals in sentences. We show that such seemingly insignificant changes in the input can causelarge differences in the output, and suggest that data augmentation can remedy the problems.

Place, publisher, year, edition, pages
Association for Computational Linguistics (ACL), 2021
National Category
Natural Language Processing
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-326888 (URN)2-s2.0-85138675937 (Scopus ID)
Conference
UDW 2021 - 5th Workshop on Universal Dependencies, Proceedings - To be held as part of SyntaxFest 2021, Sofia, 21-25 March 2021
Funder
Vinnova, 2019-02997
Note

Part of proceedings ISBN 978-195591717-9 

QC 20230515

Available from: 2023-05-15 Created: 2023-05-15 Last updated: 2025-02-07Bibliographically approved
Kalpakchi, D. & Boye, J. (2020). UDon2: a library for manipulating Universal Dependencies trees. In: Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020): . Paper presented at 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), 8-13 December 2020 (pp. 120-125).
Open this publication in new window or tab >>UDon2: a library for manipulating Universal Dependencies trees
2020 (English)In: Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020), 2020, p. 120-125Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

UDon2 is an open-source library for manipulating dependency trees represented in the CoNLL-U format. The library is compatible with the Universal Dependencies. UDon2 is aimed at developers of downstream Natural Language Processing applications that require manipulating dependency trees on the sentence level (in addition to other available tools geared towards working with treebanks).

National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-288878 (URN)
Conference
28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), 8-13 December 2020
Funder
Vinnova
Note

QC 20210115

Available from: 2021-01-14 Created: 2021-01-14 Last updated: 2025-02-07Bibliographically approved
Kalpakchi, D. & Boye, J. (2019). SpaceRefNet: a neural approach to spatial reference resolution in a real city environment. In: Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue: . Paper presented at 20th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2019, 11-13 September 2019, Stockholm, Sweden (pp. 422-431). Association for Computational Linguistics
Open this publication in new window or tab >>SpaceRefNet: a neural approach to spatial reference resolution in a real city environment
2019 (English)In: Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, Association for Computational Linguistics , 2019, p. 422-431Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

Adding interactive capabilities to pedestrian wayfinding systems in the form of spoken dialogue will make them more natural to humans. Such an interactive wayfinding system needs to continuously understand and interpret pedestrian’s utterances referring to the spatial context. Achieving this requires the system to identify exophoric referring expressions in the utterances, and link these expressions to the geographic entities in the vicinity. This exophoric spatial reference resolution problem is difficult, as there are often several dozens of candidate referents. We present a neural network-based approach for identifying pedestrian’s references (using a network called RefNet) and resolving them to appropriate geographic objects (using a network called SpaceRefNet). Both methods show promising results beating the respective baselines and earlier reported results in the literature.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2019
National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-262883 (URN)10.18653/v1/w19-5949 (DOI)000591510500049 ()2-s2.0-85091595033 (Scopus ID)
Conference
20th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2019, 11-13 September 2019, Stockholm, Sweden
Note

QC 20210914

Available from: 2019-10-22 Created: 2019-10-22 Last updated: 2025-02-07Bibliographically approved
Götze, J. & Johan, B. (2017). Reference resolution for pedestrian wayfinding systems. In: 20th AGILE International Conference on Geographic Information Science, 2017: . Paper presented at 9 May 2017 through 12 May 2017 (pp. 59-75). Kluwer Academic Publishers
Open this publication in new window or tab >>Reference resolution for pedestrian wayfinding systems
2017 (English)In: 20th AGILE International Conference on Geographic Information Science, 2017, Kluwer Academic Publishers , 2017, p. 59-75Conference paper, Published paper (Refereed)
Abstract [en]

References to objects in our physical environment are common especially in language about wayfinding. Advanced wayfinding systems that interact with the pedestrian bymeans of (spoken) natural language therefore need to be able to resolve references given by pedestrians (i.e. understand what entity the pedestrian is referring to). The contribution of this paper is a probabilistic approach to reference resolution in a large-scale, real city environment, where the context changes constantly as the pedestrians are moving. The geographic situation, including information about objects’ location and type, is represented using OpenStreetMap data.

Place, publisher, year, edition, pages
Kluwer Academic Publishers, 2017
Keywords
Data-driven methods, Natural language processing, Openstreetmap, Pedestrian navigation, Probabilistic approach, Reference resolution, Wayfinding, Air navigation, Geographic information systems, Mobile devices, Natural language processing systems, Telecommunication services, Probabilistic approaches, Way-finding, Location based services
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-207439 (URN)10.1007/978-3-319-56759-4_4 (DOI)000460575300004 ()2-s2.0-85017505207 (Scopus ID)9783319472881 (ISBN)9783319567587 (ISBN)
Conference
9 May 2017 through 12 May 2017
Note

QC 20170523

Available from: 2017-05-23 Created: 2017-05-23 Last updated: 2022-06-27Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-2600-7668

Search in DiVA

Show all publications