kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0001-7327-3059
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-2600-7668
2021 (English)In: Proceedings of the 14th International Conference on Natural Language Generation, 2021, p. 387-403Conference paper, Published paper (Refereed)
Abstract [en]

An important part when constructing multiple-choice questions (MCQs) for reading comprehension assessment are the distractors, the incorrect but preferably plausible answer options. In this paper, we present a new BERT-based method for automatically generating distractors using only a small-scale dataset. We also release a new such dataset of Swedish MCQs (used for training the model), and propose a methodology for assessing the generated distractors. Evaluation shows that from a student's perspective, our method generated one or more plausible distractors for more than 50% of the MCQs in our test set. From a teacher's perspective, about 50% of the generated distractors were deemed appropriate. We also do a thorough analysis of the results

Place, publisher, year, edition, pages
2021. p. 387-403
Keywords [en]
Multiple-choice questions, Reading comprehension, Small scale, Student perspectives, Swedishs, Teachers', Test sets
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-302480Scopus ID: 2-s2.0-85123291566OAI: oai:DiVA.org:kth-302480DiVA, id: diva2:1597195
Conference
14th International Conference on Natural Language Generation, INLG 2021, Virtual/Online, 20-24 September 2021
Funder
Vinnova, 2019-02997
Note

Part of proceedings: ISBN 978-1-954085-51-0

QC 20220301

Available from: 2021-09-24 Created: 2021-09-24 Last updated: 2023-09-14Bibliographically approved
In thesis
1. Ask and distract: Data-driven methods for the automatic generation of multiple-choice reading comprehension questions from Swedish texts
Open this publication in new window or tab >>Ask and distract: Data-driven methods for the automatic generation of multiple-choice reading comprehension questions from Swedish texts
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Fråga och distrahera : Datadrivna metoder för automatisk generering av flervalsfrågor för att bedöma läsförståelse av svenska
Abstract [en]

Multiple choice questions (MCQs) are widely used for summative assessment in many different subjects. The tasks in this format are particularly appealing because they can be graded swiftly and automatically. However, the process of creating MCQs is far from swift or automatic and requires a lot of expertise both in the specific subject and also in test construction.

This thesis focuses on exploring methods for the automatic MCQ generation for assessing the reading comprehension abilities of second-language learners of Swedish. We lay the foundations for the MCQ generation research for Swedish by collecting two datasets of reading comprehension MCQs, and designing and developing methods for generating the whole MCQs or its parts. An important contribution is the methods (which were designed and applied in practice) for the automatic and human evaluation of the generated MCQs.

The best currently available method (as of June 2023) for generating MCQs for assessing reading comprehension in Swedish is ChatGPT (although still only around 60% of generated MCQs were judged acceptable). However, ChatGPT is neither open-source, nor free. The best open-source and free-to-use method is the fine-tuned version of SweCTRL-Mini, a foundational model developed as a part of this thesis. Nevertheless, all explored methods are far from being useful in practice but the reported results provide a good starting point for future research.

Abstract [sv]

Flervalsfrågor används ofta för summativ bedömning i många olika ämnen. Flervalsfrågor är tilltalande eftersom de kan bedömas snabbt och automatiskt. Att skapa flervalsfrågor manuellt går dock långt ifrån snabbt, utan är en process som kräver mycket expertis inom det specifika ämnet och även inom provkonstruktion.

Denna avhandling fokuserar på att utforska metoder för automatisk generering av flervalsfrågor för bedömning av läsförståelse hos andraspråksinlärare av svenska. Vi lägger grunden för forskning om generering av flervalsfrågor för svenska genom att samla in två datamängder bestående av flervalsfrågor som testar just läsförståelse, och genom att utforma och utveckla metoder för att generera hela eller delar av flervalsfrågor. Ett viktigt bidrag är de metoder för automatisk och mänsklig utvärdering av genererade flervalsfrågor som har utvecklats och tillämpats i praktiken.

Den bästa för närvarande tillgängliga metoden (i juni 2023) för att generera flervalsfrågor som testar läsförståelse på svenska är ChatGPT (dock bedömdes endast cirka 60% av de genererade flervalsfrågorna som acceptabla). ChatGPT har dock varken öppen källkod eller är gratis. Den bästa metoden med öppen källkod som är också gratis är den finjusterade versionen av SweCTRL-Mini, en “foundational model” som utvecklats som en del av denna avhandling. Alla utforskade metoder är dock långt ifrån användbara i praktiken, men de rapporterade resultaten ger en bra utgångspunkt för framtida forskning.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2023. p. viii, 67
Series
TRITA-EECS-AVL ; 2023:56
Keywords
multiple choice questions, question generation, distractor generation, reading comprehension, second-language learners, L2 learning, Natural Language Generation, flervalsfrågor, frågegenerering, distraktorsgenerering, läsförståelse, andraspråkslärande, L2-inlärning, Natural Language Generation
National Category
Language Technology (Computational Linguistics)
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-336531 (URN)978-91-8040-661-1 (ISBN)
Public defence
2023-10-17, F3, Lindstedtsvägen 26, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20230915

Available from: 2023-09-15 Created: 2023-09-14 Last updated: 2023-09-25Bibliographically approved

Open Access in DiVA

fulltext(504 kB)280 downloads
File information
File name FULLTEXT01.pdfFile size 504 kBChecksum SHA-512
6d52436b951ebcf2537df082404157bcfd52dc6e33dc96a6971dbedc21e3ca390896b7c0c7b75ee03058affd9a1188500a80580c7a7e7fa0458e724b14ef98ae
Type fulltextMimetype application/pdf

Other links

Scopushttps://aclanthology.org/2021.inlg-1.43

Authority records

Kalpakchi, DmytroBoye, Johan

Search in DiVA

By author/editor
Kalpakchi, DmytroBoye, Johan
By organisation
Speech, Music and Hearing, TMH
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 280 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 327 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf