kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Introducing GenCeption for Multimodal LLM Benchmarking: You May Bypass Annotations
Motherbrain, EQT Group, Stockholm, Sweden.
Motherbrain, EQT Group, Stockholm, Sweden.
KTH. Motherbrain, EQT Group, Stockholm, Sweden; Télécom Paris, Palaiseau, France; Eurecom, Biot, France.ORCID iD: 0009-0001-6451-0136
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0002-3089-0345
2024 (English)In: TrustNLP 2024 - 4th Workshop on Trustworthy Natural Language Processing, Proceedings of the Workshop, Association for Computational Linguistics (ACL) , 2024, p. 196-201Conference paper, Published paper (Refereed)
Abstract [en]

Multimodal Large Language Models (MLLMs) are commonly evaluated using costly annotated multimodal benchmarks. However, these benchmarks often struggle to keep pace with the rapidly advancing requirements of MLLM evaluation. We propose GenCeption, a novel and annotation-free MLLM evaluation framework that merely requires unimodal data to assess inter-modality semantic coherence and inversely reflects the models' inclination to hallucinate. Analogous to the popular DrawCeption game, GenCeption initiates with a non-textual sample and undergoes a series of iterative description and generation steps. Semantic drift across iterations is quantified using the GC@T metric. Our empirical findings validate GenCeption's efficacy, showing strong correlations with popular MLLM benchmarking results. GenCeption may be extended to mitigate training data contamination by utilizing ubiquitous, previously unseen unimodal data.

Place, publisher, year, edition, pages
Association for Computational Linguistics (ACL) , 2024. p. 196-201
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:kth:diva-361979DOI: 10.18653/v1/2024.trustnlp-1.16Scopus ID: 2-s2.0-105000832382OAI: oai:DiVA.org:kth-361979DiVA, id: diva2:1949652
Conference
4th Workshop on Trustworthy Natural Language Processing, TrustNLP 2024, Mexico City, Mexico, June 21, 2024
Note

Part of ISBN 9798891761131

QC 20250409

Available from: 2025-04-03 Created: 2025-04-03 Last updated: 2025-04-09Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Senane, ZinebYang, Fangkai

Search in DiVA

By author/editor
Senane, ZinebYang, Fangkai
By organisation
KTHComputational Science and Technology (CST)
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 18 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf