kth.sePublikationer KTH
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Using LLMs to Grade Clinical Reasoning for Medical Students in Virtual Patient Dialogues
KTH, Skolan för elektroteknik och datavetenskap (EECS). Karolinska Institute, Sweden.ORCID-id: 0009-0001-0445-630X
KTH, Skolan för elektroteknik och datavetenskap (EECS).ORCID-id: 0009-0004-8417-6106
Karolinska Institute, Sweden.
Karolinska Institute, Sweden.ORCID-id: 0000-0002-4875-5395
Visa övriga samt affilieringar
2025 (Engelska)Ingår i: Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue / [ed] Frédéric Béchet, Fabrice Lefèvre, Nicholas Asher, Seokhwan Kim, Teva Merlin, SIGDIAL , 2025, s. 750-763Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This paper presents an evaluation of the use of large language models (LLMs) for grading clinical reasoning during rheumatology medical history virtual patient (VP) simulations. The study explores the feasibility of using state-of-the-art LLMs, including both general-purpose models, with various prompting strategies such as zero-shot, analysis-first, and chain-of-thought prompting, as well as reasoning models. The performance of these models in grading transcribed dialogues from VP simulations conducted on a Furhat robot was evaluated against human expert annotations. Human experts initially achieved a 65% inter-rater agreement, which resulted in a pooled Cohen’s Kappa of 0.71 and 82.3% correctness. The best LLM, o3-mini, achieved a pooled Kappa of 0.68 and 81.5% correctness, with response times under 30 seconds, compared to approximately 6 minutes for human grading. These results indicate the possibility that automatic assessments can approach human reliability under controlled simulation conditions while delivering time and cost efficiencies.

Ort, förlag, år, upplaga, sidor
SIGDIAL , 2025. s. 750-763
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
URN: urn:nbn:se:kth:diva-374882OAI: oai:DiVA.org:kth-374882DiVA, id: diva2:2025321
Konferens
The 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Avignon, France, Aug 25-27, 2025
Anmärkning

Part of ISBN 979-8-89176-329-6

QC 20260107

Tillgänglig från: 2026-01-06 Skapad: 2026-01-06 Senast uppdaterad: 2026-01-07Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Paper

Person

Schiött, JonathanIvegren, WilliamSkantze, Gabriel

Sök vidare i DiVA

Av författaren/redaktören
Schiött, JonathanIvegren, WilliamParodis, IoannisSkantze, Gabriel
Av organisationen
Skolan för elektroteknik och datavetenskap (EECS)Tal, musik och hörsel, TMH
Data- och informationsvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 32 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf