kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Hierarchical Fusion Approaches for Enhancing Multimodal Emotion Recognition in Dialogue-Based Systems: A Systematic Study of Multimodal Emotion Recognition Fusion Strategy
KTH, School of Electrical Engineering and Computer Science (EECS).
2023 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Hierarkiska fusionsmetoder för att förbättra multimodal känslomässig igenkänning i dialogbaserade system : En systematisk studie av fusionsstrategier för multimodal känslomässig igenkänning (Swedish)
Abstract [en]

Multimodal Emotion Recognition (MER) has gained increasing attention due to its exceptional performance. In this thesis, we evaluate feature-level fusion, decision-level fusion, and two proposed hierarchical fusion methods for MER systems using a dialogue-based dataset. The first hierarchical approach integrates abstract features across different temporal levels by employing RNN-based and transformer-based context modeling techniques to capture nearby and global context respectively. The second hierarchical strategy incorporates shared information between modalities by facilitating modality interactions through attention mechanisms. Results reveal that RNN-based hierarchical fusion surpasses the baseline by 2%, while transformer-based context modeling and modality interaction methods improve accuracy by 0.5% and 0.6%, respectively. These findings underscore the significance of capturing meaningful emotional cues in nearby context and emotional invariants in dialogue MER systems. We also emphasize the crucial role of text modality. Overall, our research highlights the potential of hierarchical fusion approaches for enhancing MER system performance, presenting systematic strategies supported by empirical evidence.

Abstract [sv]

Multimodal Emotion Recognition (MER) har fått ökad uppmärksamhet på grund av dess exceptionella prestanda. I denna avhandling utvärderar vi feature-level fusion, decision-level fusion och två föreslagna hierarkiska fusion-metoder för MER-system med hjälp av en dialogbaserad dataset. Den första hierarkiska metoden integrerar abstrakta funktioner över olika tidsnivåer genom att använda RNN-baserade och transformer-baserade tekniker för kontextmodellering för att fånga närliggande och globala kontexter, respektive. Den andra hierarkiska strategin innefattar delad information mellan modaliteter genom att underlätta modalitetsinteraktioner genom uppmärksamhetsmekanismer. Resultaten visar att RNN-baserad hierarkisk fusion överträffar baslinjen med 2%, medan transformer-baserad kontextmodellering och modellering av modalitetsinteraktion ökar noggrannheten med 0.5% respektive 0.6%. Dessa resultat understryker betydelsen av att fånga meningsfulla känslomässiga ledtrådar i närliggande sammanhang och emotionella invarianter i dialog MER-system. Vi betonar också den avgörande rollen som textmodalitet spelar. Övergripande betonar vår forskning potentialen för hierarkiska fusion-metoder för att förbättra prestandan i MER-system, genom att presentera systematiska strategier som stöds av empirisk evidens.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology , 2023. , p. 24
Series
TRITA-EECS-EX ; 2023:394
Keywords [en]
Multimodal, emotion recognition, hierarchical fusion, context modeling, attention mechanism, feature-level fusion, decision-level fusion, machine learning, deep learning
Keywords [sv]
Multimodal, emotion recognition, hierarkisk fusion, kontextmodellering, uppmärksamhetsmekanism, funktionssammanfogning på nivån för egenskaper, beslutsnivås-funktionssammanfogning, maskininlärning, djupinlärning
National Category
Computer Sciences Computer and Information Sciences Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-333924OAI: oai:DiVA.org:kth-333924DiVA, id: diva2:1787725
Presentation
2023-06-12, via Zoom https://kth-se.zoom.us/j/62014600444, Online, Stockholm, 15:00 (English)
Supervisors
Examiners
Available from: 2023-08-19 Created: 2023-08-14 Last updated: 2025-02-18Bibliographically approved

Open Access in DiVA

fulltext(3082 kB)488 downloads
File information
File name FULLTEXT01.pdfFile size 3082 kBChecksum SHA-512
08c67a3127c499c0b69902c869f326789dbaec09ba717359fc6a9ad2609cdb6b0789e53d464f09b18d3234d2339d1dbe700520a1ee637bfa5d608e67bdc9739f
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer SciencesComputer and Information SciencesComputer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 490 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 246 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf