kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 20) Show all publications
Cao, L., Buchner, V., Senane, Z. & Yang, F. (2025). GenCeption: Evaluate vision LLMs with unlabeled unimodal data. Computer speech & language (Print), 93, Article ID 101785.
Open this publication in new window or tab >>GenCeption: Evaluate vision LLMs with unlabeled unimodal data
2025 (English)In: Computer speech & language (Print), ISSN 0885-2308, E-ISSN 1095-8363, Vol. 93, article id 101785Article in journal (Refereed) Published
Abstract [en]

Multimodal Large Language Models (MLLMs) are typically assessed using expensive annotated multimodal benchmarks, which often lag behind the rapidly evolving demands of MLLM evaluation. This paper outlines and validates GenCeption, a novel, annotation-free evaluation method that requires only unimodal data to measure inter-modality semantic coherence and inversely assesses MLLMs’ tendency to hallucinate. This approach eliminates the need for costly data annotation, minimizes the risk of training data contamination, is expected to result in slower benchmark saturation, and avoids the illusion of emerging abilities. Inspired by the DrawCeption game, GenCeption begins with a non-textual sample and proceeds through iterative description and generation steps. The semantic drift across iterations is quantified using the GC@T metric. While GenCeption is principally applicable to MLLMs across various modalities, this paper focuses on its implementation and validation for Vision LLMs (VLLMs). Based on the GenCeption method, we establish the MMECeption benchmark for evaluating VLLMs, and compare the performance of several popular VLLMs and human annotators. Our empirical results validate GenCeption's effectiveness, demonstrating strong correlations with established VLLM benchmarks. VLLMs still significantly lag behind human performance and struggle especially with text-intensive tasks.

Place, publisher, year, edition, pages
Elsevier BV, 2025
Keywords
Benchmark, Evaluation, Multimodal large language model
National Category
Computer graphics and computer vision Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-361201 (URN)10.1016/j.csl.2025.101785 (DOI)2-s2.0-85219499531 (Scopus ID)
Note

QC 20250313

Available from: 2025-03-12 Created: 2025-03-12 Last updated: 2025-03-13Bibliographically approved
Cao, L., Buchner, V., Senane, Z. & Yang, F. (2024). Introducing GenCeption for Multimodal LLM Benchmarking: You May Bypass Annotations. In: TrustNLP 2024 - 4th Workshop on Trustworthy Natural Language Processing, Proceedings of the Workshop: . Paper presented at 4th Workshop on Trustworthy Natural Language Processing, TrustNLP 2024, Mexico City, Mexico, June 21, 2024 (pp. 196-201). Association for Computational Linguistics (ACL)
Open this publication in new window or tab >>Introducing GenCeption for Multimodal LLM Benchmarking: You May Bypass Annotations
2024 (English)In: TrustNLP 2024 - 4th Workshop on Trustworthy Natural Language Processing, Proceedings of the Workshop, Association for Computational Linguistics (ACL) , 2024, p. 196-201Conference paper, Published paper (Refereed)
Abstract [en]

Multimodal Large Language Models (MLLMs) are commonly evaluated using costly annotated multimodal benchmarks. However, these benchmarks often struggle to keep pace with the rapidly advancing requirements of MLLM evaluation. We propose GenCeption, a novel and annotation-free MLLM evaluation framework that merely requires unimodal data to assess inter-modality semantic coherence and inversely reflects the models' inclination to hallucinate. Analogous to the popular DrawCeption game, GenCeption initiates with a non-textual sample and undergoes a series of iterative description and generation steps. Semantic drift across iterations is quantified using the GC@T metric. Our empirical findings validate GenCeption's efficacy, showing strong correlations with popular MLLM benchmarking results. GenCeption may be extended to mitigate training data contamination by utilizing ubiquitous, previously unseen unimodal data.

Place, publisher, year, edition, pages
Association for Computational Linguistics (ACL), 2024
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-361979 (URN)10.18653/v1/2024.trustnlp-1.16 (DOI)2-s2.0-105000832382 (Scopus ID)
Conference
4th Workshop on Trustworthy Natural Language Processing, TrustNLP 2024, Mexico City, Mexico, June 21, 2024
Note

Part of ISBN 9798891761131

QC 20250409

Available from: 2025-04-03 Created: 2025-04-03 Last updated: 2025-04-09Bibliographically approved
Yang, F., Gao, Y., Ma, R., Zojaji, S., Castellano, G. & Peters, C. (2021). A dataset of human and robot approach behaviors into small free-standing conversational groups. PLOS ONE, 16(2), Article ID e0247364.
Open this publication in new window or tab >>A dataset of human and robot approach behaviors into small free-standing conversational groups
Show others...
2021 (English)In: PLOS ONE, E-ISSN 1932-6203, Vol. 16, no 2, article id e0247364Article in journal (Refereed) Published
Abstract [en]

The analysis and simulation of the interactions that occur in group situations is important when humans and artificial agents, physical or virtual, must coordinate when inhabiting similar spaces or even collaborate, as in the case of human-robot teams. Artificial systems should adapt to the natural interfaces of humans rather than the other way around. Such systems should be sensitive to human behaviors, which are often social in nature, and account for human capabilities when planning their own behaviors. A limiting factor relates to our understanding of how humans behave with respect to each other and with artificial embodiments, such as robots. To this end, we present CongreG8 (pronounced 'con-gregate'), a novel dataset containing the full-body motions of free-standing conversational groups of three humans and a newcomer that approaches the groups with the intent of joining them. The aim has been to collect an accurate and detailed set of positioning, orienting and full-body behaviors when a newcomer approaches and joins a small group. The dataset contains trials from human and robot newcomers. Additionally, it includes questionnaires about the personality of participants (BFI-10), their perception of robots (Godspeed), and custom human/robot interaction questions. An overview and analysis of the dataset is also provided, which suggests that human groups are more likely to alter their configuration to accommodate a human newcomer than a robot newcomer. We conclude by providing three use cases that the dataset has already been applied to in the domains of behavior detection and generation in real and virtual environments.

Place, publisher, year, edition, pages
Public Library of Science (PLoS), 2021
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-292958 (URN)10.1371/journal.pone.0247364 (DOI)000624536800095 ()33630908 (PubMedID)2-s2.0-85102096812 (Scopus ID)
Note

QC 20210419

Available from: 2021-04-19 Created: 2021-04-19 Last updated: 2022-06-25Bibliographically approved
Yang, F., Yin, W., Inamura, T., Björkman, M. & Peters, C. (2020). Group Behavior Recognition Using Attention- and Graph-Based Neural Networks. In: ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE: . Paper presented at the 24th European Conference on Artificial Intelligence (ECAI), 29 aug -8 sep, 2020.
Open this publication in new window or tab >>Group Behavior Recognition Using Attention- and Graph-Based Neural Networks
Show others...
2020 (English)In: ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020Conference paper, Published paper (Refereed)
Abstract [en]

When a conversational group is approached by a new-comer who wishes to join it, the group may dynamically react byadjusting their positions and orientations in order to accommodate it.These reactions represent important cues to the newcomer about ifand how they should plan their approach. The recognition and anal-ysis of such socially complaint dynamic group behaviors have rarelybeen studied in depth and remain a challenging problem in socialmulti-agent systems. In this paper, we present novel group behaviorrecognition models, attention-based and graph-based, that considerbehaviors on both the individual and group levels. The attention-based category consists of Approach Group Net (AGNet) and Ap-proach Group Transformer (AGTransformer). They share a similararchitecture and use attention mechanisms to encode both tempo-ral and spatial information on both the individual and group levels.The graph-based models consist of Approach Group Graph Convolu-tional Networks (AG-GCN), which combine Multi-Spatial-TemporalGraph Convolutional Networks (MST-GCN) on the individual leveland Graph Convolutional Networks (GCN) on the group level, withmulti-temporal stages. The individual level learns the spatial andtemporal movement patterns of each agent, while the group levelcaptures the relations and interactions of multiple agents. In orderto train and evaluate these models, we collected a full-body motion-captured dataset of multiple individuals in conversational groups. Ex-periments performed using our models to recognize group behaviorsfrom the collected dataset show that AG-GCN, with additional dis-tance and orientation information, achieves the best performance. Wealso present a multi-agent interaction use case in a virtual environ-ment to show how the models can be practically applied

Series
Frontiers in Artificial Intelligence and Applications
National Category
Computer graphics and computer vision Computer Sciences
Identifiers
urn:nbn:se:kth:diva-287334 (URN)10.3233/FAIA200273 (DOI)000650971301111 ()2-s2.0-85091792981 (Scopus ID)
Conference
the 24th European Conference on Artificial Intelligence (ECAI), 29 aug -8 sep, 2020
Note

QC 20210621

Available from: 2020-12-07 Created: 2020-12-07 Last updated: 2025-02-01Bibliographically approved
Yang, F., Yin, W., Björkman, M. & Peters, C. (2020). Impact of Trajectory Generation Methods on Viewer Perception of Robot Approaching Group Behaviors. In: 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020: . Paper presented at 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020; Virtual, Naples; Italy; 31 August 2020 through 4 September 2020 (pp. 509-516). Institute of Electrical and Electronics Engineers (IEEE), Article ID 9223584.
Open this publication in new window or tab >>Impact of Trajectory Generation Methods on Viewer Perception of Robot Approaching Group Behaviors
2020 (English)In: 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020, Institute of Electrical and Electronics Engineers (IEEE) , 2020, p. 509-516, article id 9223584Conference paper, Published paper (Refereed)
Abstract [en]

Mobile robots that approach free-standing conversational groups to join them should behave in a safe and socially-acceptable way. Existing trajectory generation methods focus on collision avoidance with pedestrians, and the models that generate approach behaviors into groups are evaluated in simulation. However. it is challenging to generate approach and join trajectories that avoid collisions with group members while also ensuring that they do not invoke feelings of discomfort. In this paper, we conducted an experiment to examine the impact of three trajectory generation methods for a mobile robot to approach groups from multiple directions: a Wizard of-Oz (WoZ) method, a procedural social-aware navigation model (PM) and a novel generative adversarial model imitating human approach behaviors (IL). Measures also compared two camera viewpoints and static versus quasi-dynamic groups. The latter refers to a group whose members change orientation and position throughout the approach task, even though the group entity remains static in the environment. This represents a more realistic but challenging scenario for the robot. We evaluate three methods with objective measurements and subjective measurements from viewer perception, and results show that NAToZ. and IL have comparable performance, and both perform better than PM under most conditions.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2020
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-287333 (URN)10.1109/RO-MAN47096.2020.9223584 (DOI)000598571700074 ()2-s2.0-85095762675 (Scopus ID)
Conference
29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020; Virtual, Naples; Italy; 31 August 2020 through 4 September 2020
Note

QC 20201208

Available from: 2020-12-07 Created: 2020-12-07 Last updated: 2025-02-09Bibliographically approved
Yang, F. (2020). Simulating Group Interactions through Machine Learning and Human Perception. (Doctoral dissertation). KTH Royal Institute of Technology
Open this publication in new window or tab >>Simulating Group Interactions through Machine Learning and Human Perception
2020 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Human-Robot/Agent Interaction is well researched in many areas, but approaches commonly either focus on dyadic interactions or crowd simulations. However, the intermediate structure between individuals and crowds, i.e., small groups, has been studied less. In small group situations, it is challenging for mobile robots or agents to approach free-standing conversational groups in a socially acceptable manner. It requires the robot or agent to plan trajectories that avoid collisions with people and consider the perception of group members to make them feel comfortable. Previous methods are mostly procedural with handcrafted features that limit the realism and adaptation of the simulation. In this thesis, Human-Robot/Agent Interaction is investigated at multiple levels, including individuals, crowds, and small groups. Firstly, this thesis is an exploration of proxemics in dyadic interactions in virtual environments. It investigates the impact of various embodiments on human perception and sensitivities. A related toolkit is developed as a foundation for simulating virtual characters in the subsequent research. Secondly, this thesis extends proxemics to crowd simulation and trajectory prediction by proposing neighbor perception models. It then focuses on group interactions in which robots/agents approach small groups in order to join them. To address the challenges above, novel procedural models based on social space and machine learning models, including generative adversarial neural networks, state refinement LSTM, reinforcement learning, and imitation learning, are proposed to generate approach behaviors. A novel dataset of full-body motion-captured markers was also collected in order to support machine learning approaches. Finally, these methods are evaluated in scenarios involving humans, virtual agents, and physical robots.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2020
National Category
Robotics and automation Computer graphics and computer vision Social Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-287337 (URN)
Public defence
2021-01-25, VIC Studio, Lindstedtsvägen 5, plan 4, KTH, 114 28 Stockholm, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

QC 20201208

Available from: 2020-12-08 Created: 2020-12-07 Last updated: 2025-02-05Bibliographically approved
Yang, F. & Peters, C. (2019). AppGAN: Generative Adversarial Networks for Generating Robot Approach Behaviors into Small Groups of People. In: 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN): . Paper presented at 28th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2019, New Delhi, India, 14-18 October 2019. Institute of Electrical and Electronics Engineers (IEEE), Article ID 8956425.
Open this publication in new window or tab >>AppGAN: Generative Adversarial Networks for Generating Robot Approach Behaviors into Small Groups of People
2019 (English)In: 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), Institute of Electrical and Electronics Engineers (IEEE), 2019, article id 8956425Conference paper, Published paper (Refereed)
Abstract [en]

Robots that navigate to approach free-standing conversational groups should do so in a safe and socially acceptable manner. This is challenging since it not only requires the robot to plot trajectories that avoid collisions with members of the group, but also to do so without making those in the group feel uncomfortable, for example, by moving too close to them or approaching them from behind. Previous trajectory prediction models focus primarily on formations of walking pedestrians, and those models that do consider approach behaviours into free-standing conversational groups typically have handcrafted features and are only evaluated via simulation methods, limiting their effectiveness. In this paper, we propose AppGAN, a novel trajectory prediction model capable of generating trajectories into free-standing conversational groups trained on a dataset of safe and socially acceptable paths. We evaluate the performance of our model with state-of-the-art trajectory prediction methods on a semi-synthetic dataset. We show that our model outperforms baselines by taking advantage of the GAN framework and our novel group interaction module.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Series
IEEE RO-MAN, ISSN 1944-9445
Keywords
Air navigation, Forecasting, Trajectories, Adversarial networks, Approach behaviours, Group interaction, State of the art, Trajectory prediction, Robots
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-275636 (URN)10.1109/RO-MAN46459.2019.8956425 (DOI)000533896300141 ()2-s2.0-85078870082 (Scopus ID)
Conference
28th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2019, New Delhi, India, 14-18 October 2019
Note

QC 20200609

Part of ISBN 978-1-7281-2622-7

Available from: 2020-06-09 Created: 2020-06-09 Last updated: 2024-10-15Bibliographically approved
Yang, F. & Peters, C. (2019). App-LSTM: Data-driven generation of socially acceptable trajectories for approaching small groups of agents. In: HAI 2019 - Proceedings of the 7th International Conference on Human-Agent Interaction: . Paper presented at 7th International Conference on Human-Agent Interaction, HAI 2019, Kyoto, Japan, October 06-10, 2019 (pp. 144-152). Association for Computing Machinery, Inc
Open this publication in new window or tab >>App-LSTM: Data-driven generation of socially acceptable trajectories for approaching small groups of agents
2019 (English)In: HAI 2019 - Proceedings of the 7th International Conference on Human-Agent Interaction, Association for Computing Machinery, Inc , 2019, p. 144-152Conference paper, Published paper (Refereed)
Abstract [en]

While many works involving human-agent interactions have focused on individuals or crowds, modelling interactions on the group scale has not been considered in depth. Simulation of interactions with groups of agents is vital in many applications, enabling more comprehensive and realistic behavior encompassing all possibilities between crowd and individual levels. In this paper, we propose a novel neural network App-LSTM to generate the approach trajectory of an agent towards a small free-standing conversational group of agents. The App-LSTM model is trained on a dataset of approach behaviors towards the group. Since current publicly available datasets for these encounters are limited, we develop a social-aware navigation method as a basis for creating a semi-synthetic dataset composed of a mixture of real and simulated data representing safe and socially-acceptable approach trajectories. Via a group interaction module, App-LSTM then captures the position and orientation features of the group and refines the current state of the approaching agent iteratively to better focus on the current intention of group members. We show our App-LSTM outperforms baseline methods in generating approaching group trajectories.

Place, publisher, year, edition, pages
Association for Computing Machinery, Inc, 2019
Keywords
Approach behaviors, Human-agent interaction, Machine learning, Small groups, Trajectory generation, Air navigation, Behavioral research, Iterative methods, Learning systems, Trajectories, Approach trajectories, Data Driven Generation, Human agent interactions, Novel neural network, Position and orientations, Long short-term memory
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-272353 (URN)10.1145/3349537.3351885 (DOI)000719339300021 ()2-s2.0-85077121010 (Scopus ID)
Conference
7th International Conference on Human-Agent Interaction, HAI 2019, Kyoto, Japan, October 06-10, 2019
Note

QC 20200513

Part of ISBN 9781450369220

Available from: 2020-05-13 Created: 2020-05-13 Last updated: 2024-10-15Bibliographically approved
Saikia, H., Yang, F. & Peters, C. (2019). Criticality-based collision avoidance prioritization for crowd navigation. In: HAI 2019 - Proceedings of the 7th International Conference on Human-Agent Interaction: . Paper presented at 7th International Conference on Human-Agent Interaction, HAI 2019, Kyoto, Japan, October 06-10, 2019 (pp. 153-161). Association for Computing Machinery, Inc
Open this publication in new window or tab >>Criticality-based collision avoidance prioritization for crowd navigation
2019 (English)In: HAI 2019 - Proceedings of the 7th International Conference on Human-Agent Interaction, Association for Computing Machinery, Inc , 2019, p. 153-161Conference paper, Published paper (Refereed)
Abstract [en]

Goal directed agent navigation in crowd simulations involves a complex decision making process. An agent must avoid all collisions with static or dynamic obstacles (such as other agents) and keep a trajectory faithful to its target at the same time. This seemingly global optimization problem can be broken down into smaller local optimization problems by looking at a concept of criticality. Our method resolves critical agents - agents that are likely to come within collision range of each other - in order of priority using a Particle Swarm Optimization scheme. The resolution involves altering the velocities of agents to avoid criticality. Results from our method show that the navigation problem can be solved in several important test cases with minimal number of collisions and minimal deviation to the target direction. We prove the efficiency and correctness of our method by comparing it to four other well-known algorithms, and performing evaluations on them based on various quality measures.

Place, publisher, year, edition, pages
Association for Computing Machinery, Inc, 2019
Keywords
Crowd navigation, Crowd simulation, Optimization, Criticality (nuclear fission), Decision making, Global optimization, Particle swarm optimization (PSO), Complex decision, Dynamic obstacles, Global optimization problems, Local optimizations, Navigation problem, Quality measures, Target direction, Navigation
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-272352 (URN)10.1145/3349537.3351887 (DOI)000719339300022 ()2-s2.0-85077131900 (Scopus ID)
Conference
7th International Conference on Human-Agent Interaction, HAI 2019, Kyoto, Japan, October 06-10, 2019
Note

QC 20211005

Part of ISBN 9781450369220

Available from: 2020-05-13 Created: 2020-05-13 Last updated: 2024-10-21Bibliographically approved
Gao, Y., Yang, F., Frisk, M., Hernandez, D., Peters, C. & Castellano, G. (2019). Learning Socially Appropriate Robot Approaching Behavior Toward Groups using Deep Reinforcement Learning. In: 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN: . Paper presented at 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), OCT 14-18, 2019, New Delhi, India. IEEE
Open this publication in new window or tab >>Learning Socially Appropriate Robot Approaching Behavior Toward Groups using Deep Reinforcement Learning
Show others...
2019 (English)In: 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN, IEEE , 2019Conference paper, Published paper (Refereed)
Abstract [en]

Deep reinforcement learning has recently been widely applied in robotics to study tasks such as locomotion and grasping, but its application to social human-robot interaction (HRI) remains a challenge. In this paper, we present a deep learning scheme that acquires a prior model of robot approaching behavior in simulation and applies it to real-world interaction with a physical robot approaching groups of humans. The scheme, which we refer to as Staged Social Behavior Learning (SSBL), considers different stages of learning in social scenarios. We learn robot approaching behaviors towards small groups in simulation and evaluate the performance of the model using objective and subjective measures in a perceptual study and a HRI user study with human participants. Results show that our model generates more socially appropriate behavior compared to a state-of-the-art model.

Place, publisher, year, edition, pages
IEEE, 2019
Series
IEEE RO-MAN, ISSN 1944-9445
Keywords
Deep learning, Machine learning, Reinforcement learning, Different stages, ITS applications, Learning schemes, Objective and subjective measures, Social behavior, Social human-robot interactions, Social scenarios, State of the art, Human robot interaction
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-275635 (URN)10.1109/RO-MAN46459.2019.8956444 (DOI)000533896300159 ()2-s2.0-85078868741 (Scopus ID)
Conference
28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), OCT 14-18, 2019, New Delhi, India
Note

QC 20200609

Part of ISBN 978-1-7281-2622-7

Available from: 2020-06-09 Created: 2020-06-09 Last updated: 2025-02-09Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-3089-0345

Search in DiVA

Show all publications