Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Using LLMs to Grade Clinical Reasoning for Medical Students in Virtual Patient Dialogues
KTH, Skolan för elektroteknik och datavetenskap (EECS). Karolinska Institute, Sweden.ORCID-id: 0009-0001-0445-630X
KTH, Skolan för elektroteknik och datavetenskap (EECS).ORCID-id: 0009-0004-8417-6106
Karolinska Institute, Sweden.
Karolinska Institute, Sweden.ORCID-id: 0000-0002-4875-5395
Vise andre og tillknytning
2025 (engelsk)Inngår i: Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue / [ed] Frédéric Béchet, Fabrice Lefèvre, Nicholas Asher, Seokhwan Kim, Teva Merlin, SIGDIAL , 2025, s. 750-763Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper presents an evaluation of the use of large language models (LLMs) for grading clinical reasoning during rheumatology medical history virtual patient (VP) simulations. The study explores the feasibility of using state-of-the-art LLMs, including both general-purpose models, with various prompting strategies such as zero-shot, analysis-first, and chain-of-thought prompting, as well as reasoning models. The performance of these models in grading transcribed dialogues from VP simulations conducted on a Furhat robot was evaluated against human expert annotations. Human experts initially achieved a 65% inter-rater agreement, which resulted in a pooled Cohen’s Kappa of 0.71 and 82.3% correctness. The best LLM, o3-mini, achieved a pooled Kappa of 0.68 and 81.5% correctness, with response times under 30 seconds, compared to approximately 6 minutes for human grading. These results indicate the possibility that automatic assessments can approach human reliability under controlled simulation conditions while delivering time and cost efficiencies.

sted, utgiver, år, opplag, sider
SIGDIAL , 2025. s. 750-763
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-374882OAI: oai:DiVA.org:kth-374882DiVA, id: diva2:2025321
Konferanse
The 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Avignon, France, Aug 25-27, 2025
Merknad

Part of ISBN 979-8-89176-329-6

QC 20260107

Tilgjengelig fra: 2026-01-06 Laget: 2026-01-06 Sist oppdatert: 2026-01-07bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Paper

Person

Schiött, JonathanIvegren, WilliamSkantze, Gabriel

Søk i DiVA

Av forfatter/redaktør
Schiött, JonathanIvegren, WilliamParodis, IoannisSkantze, Gabriel
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric

urn-nbn
Totalt: 20 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf