kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-2140-0612
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-8579-1790
2024 (English)In: INLG 2024 - 17th International Natural Language Generation Conference, Proceedings of the Conference, Association for Computational Linguistics , 2024, p. 453-469Conference paper, Published paper (Refereed)
Abstract [en]

We propose an approach to referring expression generation (REG) in visually grounded dialogue that is meant to produce referring expressions (REs) that are both discriminative and discourse-appropriate. Our method constitutes a two-stage process. First, we model REG as a text- and image-conditioned next-token prediction task. REs are autoregressively generated based on their preceding linguistic context and a visual representation of the referent. Second, we propose the use of discourse-aware comprehension guiding as part of a generate-and-rerank strategy through which candidate REs generated with our REG model are reranked based on their discourse-dependent discriminatory power. Results from our human evaluation indicate that our proposed two-stage approach is effective in producing discriminative REs, with higher performance in terms of text-image retrieval accuracy for reranked REs compared to those generated using greedy decoding.

Place, publisher, year, edition, pages
Association for Computational Linguistics , 2024. p. 453-469
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-355343Scopus ID: 2-s2.0-105016680595OAI: oai:DiVA.org:kth-355343DiVA, id: diva2:1908912
Conference
17th International Natural Language Generation Conference, INLG 2024, 23 September 2024 - 27 September 2024, Tokyo, Japan
Projects
tmh_grounding
Note

QC 20241105

Part of ISBN 979-889176122-3

Available from: 2024-10-29 Created: 2024-10-29 Last updated: 2025-10-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

ScopusPublisher

Authority records

Willemsen, BramSkantze, Gabriel

Search in DiVA

By author/editor
Willemsen, BramSkantze, Gabriel
By organisation
Speech, Music and Hearing, TMH
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 104 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf