kth.sePublikationer
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Publikationer (10 of 15) Visa alla publikationer
Zhou, S., Hernandez, A. c., Gomez, C., Yin, W. & Björkman, M. (2025). SmartTBD: Smart Tracking for Resource-constrained Object Detection. ACM Transactions on Embedded Computing Systems, 24(2), Article ID 24.
Öppna denna publikation i ny flik eller fönster >>SmartTBD: Smart Tracking for Resource-constrained Object Detection
Visa övriga...
2025 (Engelska)Ingår i: ACM Transactions on Embedded Computing Systems, ISSN 1539-9087, E-ISSN 1558-3465, Vol. 24, nr 2, artikel-id 24Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

With the growing demand for video analysis on mobile devices, object tracking has demonstrated to be a suitable assistance to object detection under the Tracking-By-Detection (TBD) paradigm for reducing computational overhead and power demands. However, performing TBD with fixed hyper-parameters leads to computational inefficiency and ignores perceptual dynamics, as fixed setups tend to run suboptimally, given the variability of scenarios. In this article, we propose SmartTBD, a scheduling strategy for TBD based on multi-objective optimization of accuracy-latency metrics. SmartTBD is a novel deep reinforcement learning based scheduling architecture that computes appropriate TBD configurations in video sequences to improve the speed and detection accuracy. This involves a challenging optimization problem due to the intrinsic relation between the video characteristics and the TBD performance. Therefore, we leverage video characteristics, frame information, and the past TBD results to drive the optimization problem. Our approach surpasses baselines with fixed TBD configurations and recent research, achieving accuracy comparable to pure detection while significantly reducing latency. Moreover, it enables performance analysis of tracking and detection in diverse scenarios. The method is proven to be generalizable and highly practical in common video analytics datasets on resource-constrained devices.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2025
Nyckelord
Mobile vision, tracking-by-detection, scheduling
Nationell ämneskategori
Telekommunikation
Identifikatorer
urn:nbn:se:kth:diva-362957 (URN)10.1145/3703912 (DOI)001454951000008 ()2-s2.0-105003605284 (Scopus ID)
Anmärkning

QC 20250505

Tillgänglig från: 2025-05-05 Skapad: 2025-05-05 Senast uppdaterad: 2025-05-27Bibliografiskt granskad
Yin, W. (2024). Developing Data-Driven Models for Understanding Human Motion. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Öppna denna publikation i ny flik eller fönster >>Developing Data-Driven Models for Understanding Human Motion
2024 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Humans are the primary subjects of interest in the realm of computer vision. Specifically, perceiving, generating, and understanding human activities have long been a core pursuit of machine intelligence. Over the past few decades, data-driven methods for modeling human motion have demonstrated great potential across various interactive media and social robotics domains. Despite its impressive achievements, challenges still remain in analyzing multi-agent/multi-modal behaviors and in producing high-fidelity and highly varied motions. This complexity arises because human motion is inherently dynamic, uncertain, and intertwined with its environment. This thesis aims to introduce challenges and data-driven methods of understanding human motion and then elaborate on the contributions of the included papers. We present this thesis mainly in ascending order of complexity: recognition, synthesis, and transfer, which includes the tasks of perceiving, generating, and understanding human activities. 

Firstly, we present methods to recognize human motion (Paper A). We consider a conversational group scenario where people gather and stand in an environment to converse. Based on transformer-based networks and graph convolutional neural networks, we demonstrate how spatial-temporal group dynamics can be modeled and perceived on both the individual and group levels. Secondly, we investigate probabilistic autoregressive approaches to generate controllable human locomotion. We employ deep generative models, namely normalizing flows (Paper B) and diffusion models (Paper C), to generate and reconstruct the 3D skeletal poses of humans over time. Finally, we deal with the problem of motion style transfer. We propose style transfer systems that allow transforming motion styles while attempting to preserve motion context through GAN-based (Paper D) and diffusion-based (Paper E) methods. Compared with previous research mainly focusing on simple locomotion or exercise, we consider more complex dance movements and multimodal information. 

In summary, this thesis aims to propose methods that can effectively perceive, generate, and transfer 3D human motion. In terms of network architectures, we employ graph formulation to exploit the correlation of human skeletons, thereby introducing inductive bias through graph structures. Additionally, we leverage transformers to handle long-term data dependencies and weigh the importance of varying data components. In terms of learning frameworks, we adopt generative models to represent joint distribution over relevant variables and multiple modalities, which are flexible to cover a wide range of tasks. Our experiments demonstrate the effectiveness of the proposed frameworks by evaluating the methods on our own collected dataset and public datasets. We show how these methods are applied to various challenging tasks. 

Abstract [sv]

Människor är av primärt intresse för studier inom ämnet datorseende. Mer specifikt, att uppfatta, generera och förstå mänskliga aktiviteter har länge varit en huvudsaklig strävan inom maskinintelligens. Under de senaste årtiondena har datadrivna metoder för modellering av mänsklig rörelse visat stor potential inom olika interaktiva medier och områden för social robotik. Trots dess imponerande framgångar kvarstår utmaningar i att analysera multiagent/multimodal-beteenden och producera högupplösta och mycket varierade rörelser. Denna komplexitet uppstår eftersom mänsklig rörelse i grunden är dynamisk, osäker och sammanflätad med sin miljö. Denna avhandling syftar till att introducera utmaningar och datadrivna metoder för att förstå mänsklig rörelse och sedan beskriva bidragen från de inkluderade artiklarna. Vi presenterar denna avhandling huvudsakligen i stigande ordning av komplexitet: igenkänning, syntes och överföring, vilket inkluderar uppgifterna att uppfatta, generera och förstå mänskliga aktiviteter.

Först presenterar vi metoder för att känna igen mänsklig rörelse (Artikel A). Vi beaktar ett konversationsgruppsscenario där människor samlas och står i en miljö för att samtala. Baserat på transformer-baserade nätverk och graf-faltade neurala nätverk visar vi hur rumsligt-temporal gruppdynamik kan modelleras och uppfattas på både individ- och gruppnivåer. För det andra undersöker vi probabilistiska autoregressiva metoder för att generera kontrollerbar mänsklig rörelse. Vi använder djupa generativa modeller, nämligen normaliserande flöden (Artikel B) och diffusionsmodeller (Artikel C), för att generera och rekonstruera 3D-skelettpositioner av människor över tid. Slutligen behandlar vi problemet med översättning av rörelsestilar. Vi föreslår ett stilöversättningssystem som möjliggör omvandling av rörelsestilar samtidigt som det försöker bevara rörelsesammanhang genom GAN-baserade (Artikel D) och diffusionsbaserade (Artikel E) metoder. Jämfört med tidigare forskning som huvudsakligen fokuserar på enkel rörelse eller träning, beaktar vi mer komplexa dansrörelser och multimodal information.

Sammanfattningsvis syftar denna avhandling till att föreslå metoder som effektivt kan uppfatta, generera och översätta mänsklig rörelse i 3D. När det gäller nätverksarkitekturer använder vi en graf-formulering för att utnyttja korrelationen av mänskliga skelett, därigenom introducera induktiv bias genom grafstrukturer. Dessutom utnyttjar vi transformer för att hantera långsiktiga databeroenden och väga betydelsen av varierande komponenter i datan.När det gäller ramverk för inlärning tillämpar vi generativa modeller för att representera gemensam distribution över relevanta variabler och flera modaliteter, vilka är flexibla nog att täcka ett brett spektrum av uppgifter. Våra experiment visar effektiviteten av de föreslagna ramverken genom att utvärdera metoderna på egna insamlade dataset och offentliga dataset. Vi visar hur dessa metoder tillämpas för flertalet utmanande uppgifter.

Ort, förlag, år, upplaga, sidor
Stockholm: KTH Royal Institute of Technology, 2024. s. xiii, 68
Serie
TRITA-EECS-AVL ; 2024:9
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-342366 (URN)978-91-8040-815-8 (ISBN)
Disputation
2024-02-16, https://kth-se.zoom.us/j/62347635904, F3, Lindstedtsvägen 26, Stockholm, 14:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20240117

Tillgänglig från: 2024-01-17 Skapad: 2024-01-16 Senast uppdaterad: 2024-02-05Bibliografiskt granskad
Yin, W., Yu, Y., Yin, H., Kragic, D. & Björkman, M. (2024). Scalable Motion Style Transfer with Constrained Diffusion Generation. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence: . Paper presented at The 38th Annual AAAI Conference on Artificial Intelligence, February 20-27, 2024, Vancouver, Canada (pp. 10234-10242). Association for the Advancement of Artificial Intelligence (AAAI), 38
Öppna denna publikation i ny flik eller fönster >>Scalable Motion Style Transfer with Constrained Diffusion Generation
Visa övriga...
2024 (Engelska)Ingår i: Proceedings of the 38th AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI) , 2024, Vol. 38, s. 10234-10242Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Current training of motion style transfer systems relies on consistency losses across style domains to preserve contents, hindering its scalable application to a large number of domains and private data. Recent image transfer works show the potential of independent training on each domain by leveraging implicit bridging between diffusion models, with the content preservation, however, limited to simple data patterns. We address this by imposing biased sampling in backward diffusion while maintaining the domain independence in the training stage. We construct the bias from the source domain keyframes and apply them as the gradient of content constraints, yielding a framework with keyframe manifold constraint gradients (KMCGs). Our validation demonstrates the success of training separate models to transfer between as many as ten dance motion styles. Comprehensive experiments find a significant improvement in preserving motion contents in comparison to baseline and ablative diffusion-based style transfer models. In addition, we perform a human study for a subjective assessment of the quality of generated dance motions. The results validate the competitiveness of KMCGs.

Ort, förlag, år, upplaga, sidor
Association for the Advancement of Artificial Intelligence (AAAI), 2024
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:kth:diva-342365 (URN)10.1609/aaai.v38i9.28889 (DOI)001241512400092 ()2-s2.0-85189340183 (Scopus ID)
Konferens
The 38th Annual AAAI Conference on Artificial Intelligence, February 20-27, 2024, Vancouver, Canada
Anmärkning

QC 20241112

Tillgänglig från: 2024-01-16 Skapad: 2024-01-16 Senast uppdaterad: 2024-11-12Bibliografiskt granskad
Fu, J., Tan, J., Yin, W., Pashami, S. & Björkman, M. (2023). Component atention network for multimodal dance improvisation recognition. In: PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023: . Paper presented at 25th International Conference on Multimodal Interaction (ICMI), OCT 09-13, 2023, Sorbonne Univ, Paris, FRANCE (pp. 114-118). Association for Computing Machinery (ACM)
Öppna denna publikation i ny flik eller fönster >>Component atention network for multimodal dance improvisation recognition
Visa övriga...
2023 (Engelska)Ingår i: PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, Association for Computing Machinery (ACM) , 2023, s. 114-118Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Dance improvisation is an active research topic in the arts. Motion analysis of improvised dance can be challenging due to its unique dynamics. Data-driven dance motion analysis, including recognition and generation, is often limited to skeletal data. However, data of other modalities, such as audio, can be recorded and benefit downstream tasks. This paper explores the application and performance of multimodal fusion methods for human motion recognition in the context of dance improvisation. We propose an attention-based model, component attention network (CANet), for multimodal fusion on three levels: 1) feature fusion with CANet, 2) model fusion with CANet and graph convolutional network (GCN), and 3) late fusion with a voting strategy. We conduct thorough experiments to analyze the impact of each modality in different fusion methods and distinguish critical temporal or component features. We show that our proposed model outperforms the two baseline methods, demonstrating its potential for analyzing improvisation in dance.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2023
Nyckelord
Dance Recognition, Multimodal Fusion, Attention Network
Nationell ämneskategori
Annan data- och informationsvetenskap
Identifikatorer
urn:nbn:se:kth:diva-343780 (URN)10.1145/3577190.3614114 (DOI)001147764700016 ()2-s2.0-85175844284 (Scopus ID)
Konferens
25th International Conference on Multimodal Interaction (ICMI), OCT 09-13, 2023, Sorbonne Univ, Paris, FRANCE
Anmärkning

Part of proceedings ISBN 979-8-4007-0055-2

QC 20240222

Tillgänglig från: 2024-02-22 Skapad: 2024-02-22 Senast uppdaterad: 2024-03-05Bibliografiskt granskad
Yin, W., Tu, R., Yin, H., Kragic, D., Kjellström, H. & Björkman, M. (2023). Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models. In: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN: . Paper presented at 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA (pp. 1102-1108). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models
Visa övriga...
2023 (Engelska)Ingår i: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, Institute of Electrical and Electronics Engineers (IEEE) , 2023, s. 1102-1108Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Data-driven and controllable human motion synthesis and prediction are active research areas with various applications in interactive media and social robotics. Challenges remain in these fields for generating diverse motions given past observations and dealing with imperfect poses. This paper introduces MoDiff, an autoregressive probabilistic diffusion model over motion sequences conditioned on control contexts of other modalities. Our model integrates a cross-modal Transformer encoder and a Transformer-based decoder, which are found effective in capturing temporal correlations in motion and control modalities. We also introduce a new data dropout method based on the diffusion forward process to provide richer data representations and robust generation. We demonstrate the superior performance of MoDiff in controllable motion synthesis for locomotion with respect to two baselines and show the benefits of diffusion data dropout for robust synthesis and reconstruction of high-fidelity motion close to recorded data.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2023
Serie
IEEE RO-MAN, ISSN 1944-9445
Nationell ämneskategori
Datorgrafik och datorseende
Identifikatorer
urn:nbn:se:kth:diva-341978 (URN)10.1109/RO-MAN57019.2023.10309317 (DOI)001108678600131 ()2-s2.0-85186990309 (Scopus ID)
Konferens
32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA
Anmärkning

Part of proceedings ISBN 979-8-3503-3670-2

QC 20240110

Tillgänglig från: 2024-01-10 Skapad: 2024-01-10 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Yin, W., Yin, H., Baraka, K., Kragic, D. & Björkman, M. (2023). Dance Style Transfer with Cross-modal Transformer. In: 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV): . Paper presented at 23rd IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), JAN 03-07, 2023, Waikoloa, HI (pp. 5047-5056). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Dance Style Transfer with Cross-modal Transformer
Visa övriga...
2023 (Engelska)Ingår i: 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), Institute of Electrical and Electronics Engineers (IEEE) , 2023, s. 5047-5056Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

We present CycleDance, a dance style transfer system to transform an existing motion clip in one dance style to a motion clip in another dance style while attempting to preserve motion context of the dance. Our method extends an existing CycleGAN architecture for modeling audio sequences and integrates multimodal transformer encoders to account for music context. We adopt sequence length-based curriculum learning to stabilize training. Our approach captures rich and long-term intra-relations between motion frames, which is a common challenge in motion transfer and synthesis work. We further introduce new metrics for gauging transfer strength and content preservation in the context of dance movements. We perform an extensive ablation study as well as a human study including 30 participants with 5 or more years of dance experience. The results demonstrate that CycleDance generates realistic movements with the target style, significantly outperforming the baseline CycleGAN on naturalness, transfer strength, and content preservation. 

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2023
Serie
IEEE Winter Conference on Applications of Computer Vision, ISSN 2472-6737
Nationell ämneskategori
Datorgrafik och datorseende
Identifikatorer
urn:nbn:se:kth:diva-333220 (URN)10.1109/WACV56688.2023.00503 (DOI)000971500205016 ()2-s2.0-85149044034 (Scopus ID)
Konferens
23rd IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), JAN 03-07, 2023, Waikoloa, HI
Anmärkning

QC 20230731

Tillgänglig från: 2023-07-31 Skapad: 2023-07-31 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Yang, F., Yin, W., Wang, L., Li, T., Zhao, P., Liu, B., . . . Zhang, D. (2023). Diffusion-Based Time Series Data Imputation for Cloud Failure Prediction at Microsoft 365. In: ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering: . Paper presented at 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, United States of America, Dec 3 2023 - Dec 9 2023 (pp. 2050-2055). Association for Computing Machinery (ACM)
Öppna denna publikation i ny flik eller fönster >>Diffusion-Based Time Series Data Imputation for Cloud Failure Prediction at Microsoft 365
Visa övriga...
2023 (Engelska)Ingår i: ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Association for Computing Machinery (ACM) , 2023, s. 2050-2055Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Ensuring reliability in large-scale cloud systems like Microsoft 365 is crucial. Cloud failures, such as disk and node failure, threaten service reliability, causing service interruptions and financial loss. Existing works focus on failure prediction and proactively taking action before failures happen. However, they suffer from poor data quality, like data missing in model training and prediction, which limits performance. In this paper, we focus on enhancing data quality through data imputation by the proposed Diffusion+, a sample-efficient diffusion model, to impute the missing data efficiently conditioned on the observed data. Experiments with industrial datasets and application practice show that our model contributes to improving the performance of downstream failure prediction.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2023
Nyckelord
Diffusion model, disk failure prediction, missing data imputation
Nationell ämneskategori
Programvaruteknik Annan data- och informationsvetenskap
Identifikatorer
urn:nbn:se:kth:diva-341954 (URN)10.1145/3611643.3613866 (DOI)001148157800169 ()2-s2.0-85180547809 (Scopus ID)
Konferens
31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, United States of America, Dec 3 2023 - Dec 9 2023
Anmärkning

Part of ISBN 9798400703270

QC 20240108

Tillgänglig från: 2024-01-08 Skapad: 2024-01-08 Senast uppdaterad: 2025-07-02Bibliografiskt granskad
Yin, W., Yin, H., Baraka, K., Kragic, D. & Björkman, M. (2023). Multimodal dance style transfer. Machine Vision and Applications, 34(4), Article ID 48.
Öppna denna publikation i ny flik eller fönster >>Multimodal dance style transfer
Visa övriga...
2023 (Engelska)Ingår i: Machine Vision and Applications, ISSN 0932-8092, E-ISSN 1432-1769, Vol. 34, nr 4, artikel-id 48Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

This paper first presents CycleDance, a novel dance style transfer system that transforms an existing motion clip in one dance style into a motion clip in another dance style while attempting to preserve the motion context of the dance. CycleDance extends existing CycleGAN architectures with multimodal transformer encoders to account for the music context. We adopt a sequence length-based curriculum learning strategy to stabilize training. Our approach captures rich and long-term intra-relations between motion frames, which is a common challenge in motion transfer and synthesis work. Building upon CycleDance, we further propose StarDance, which enables many-to-many mappings across different styles using a single generator network. Additionally, we introduce new metrics for gauging transfer strength and content preservation in the context of dance movements. To evaluate the performance of our approach, we perform an extensive ablation study and a human study with 30 participants, each with 5 or more years of dance experience. Our experimental results show that our approach can generate realistic movements with the target style, outperforming the baseline CycleGAN and its variants on naturalness, transfer strength, and content preservation. Our proposed approach has potential applications in choreography, gaming, animation, and tool development for artistic and scientific innovations in the field of dance.

Ort, förlag, år, upplaga, sidor
Springer Nature, 2023
Nyckelord
Style transfer, Dance motion, Multimodal learning, Generative models
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
urn:nbn:se:kth:diva-328307 (URN)10.1007/s00138-023-01399-x (DOI)000984951800001 ()2-s2.0-85158999932 (Scopus ID)
Anmärkning

QC 20230607

Tillgänglig från: 2023-06-07 Skapad: 2023-06-07 Senast uppdaterad: 2024-01-17Bibliografiskt granskad
Demir Kanik, S. U., Yin, W., Güneysu Özgür, A., Ghadirzadeh, A., Björkman, M. & Kragic, D. (2022). Improving EEG-based Motor Execution Classification for Robot Control. In: Proceedings 14th International Conference, SCSM 2022, Held as Part of the 24th HCI International Conference, HCII 2022: Social Computing and Social Media: Design, User Experience and Impact. Paper presented at Social Computing and Social Media: Design, User Experience and Impact - 14th International Conference, SCSM 2022, Held as Part of the 24th HCI International Conference, HCII 2022, Virtual Event, June 26 - July 1, 2022 (pp. 65-82). Springer Nature
Öppna denna publikation i ny flik eller fönster >>Improving EEG-based Motor Execution Classification for Robot Control
Visa övriga...
2022 (Engelska)Ingår i: Proceedings 14th International Conference, SCSM 2022, Held as Part of the 24th HCI International Conference, HCII 2022: Social Computing and Social Media: Design, User Experience and Impact, Springer Nature , 2022, s. 65-82Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Brain Computer Interface (BCI) systems have the potential to provide a communication tool using non-invasive signals which can be applied to various fields including neuro-rehabilitation and entertainment. Interpreting multi-class movement intentions in a real time setting to control external devices such as robotic arms remains to be one of the main challenges in the BCI field. We propose a learning framework to decode upper limb movement intentions before and during the movement execution (ME) with the inclusion of motor imagery (MI) trials. The design of the framework allows the system to evaluate the uncertainty of the classification output and respond accordingly. The EEG signals collected during MI and ME trials are fed into a hybrid architecture consisting of Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) with limited pre-processing. Outcome of the proposed approach shows the potential to anticipate the intended movement direction before the onset of the movement, while waiting to reach a certainty level by potentially observing more EEG data from the beginning of the actual movement before sending control commands to the robot to avoid undesired outcomes. Presented results indicate that both the accuracy and the confidence level of the model improves with the introduction of MI trials right before the movement execution. Our results confirm the possibility of the proposed model to contribute to real-time and continuous decoding of movement directions for robotic applications.

Ort, förlag, år, upplaga, sidor
Springer Nature, 2022
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13315
Nyckelord
brain computer interface
Nationell ämneskategori
Neurovetenskaper Signalbehandling Robotik och automation
Identifikatorer
urn:nbn:se:kth:diva-318297 (URN)10.1007/978-3-031-05061-9_5 (DOI)000911435700005 ()2-s2.0-85133032331 (Scopus ID)
Konferens
Social Computing and Social Media: Design, User Experience and Impact - 14th International Conference, SCSM 2022, Held as Part of the 24th HCI International Conference, HCII 2022, Virtual Event, June 26 - July 1, 2022
Anmärkning

QC 20230307

Tillgänglig från: 2022-09-19 Skapad: 2022-09-19 Senast uppdaterad: 2025-02-05Bibliografiskt granskad
Yin, W., Yin, H., Kragic, D. & Björkman, M. (2021). Graph-based Normalizing Flow for Human Motion Generation and Reconstruction. In: 2021 30th IEEE international conference on robot and human interactive communication (RO-MAN): . Paper presented at 30th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 08-12, 2021, ELECTR NETWORK (pp. 641-648). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Graph-based Normalizing Flow for Human Motion Generation and Reconstruction
2021 (Engelska)Ingår i: 2021 30th IEEE international conference on robot and human interactive communication (RO-MAN), Institute of Electrical and Electronics Engineers (IEEE) , 2021, s. 641-648Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Data-driven approaches for modeling human skeletal motion have found various applications in interactive media and social robotics. Challenges remain in these fields for generating high-fidelity samples and robustly reconstructing motion from imperfect input data, due to e.g. missed marker detection. In this paper, we propose a probabilistic generative model to synthesize and reconstruct long horizon motion sequences conditioned on past information and control signals, such as the path along which an individual is moving. Our method adapts the existing work MoGlow by introducing a new graph-based model. The model leverages the spatial-temporal graph convolutional network (ST-GCN) to effectively capture the spatial structure and temporal correlation of skeletal motion data at multiple scales. We evaluate the models on a mixture of motion capture datasets of human locomotion with foot-step and bone-length analysis. The results demonstrate the advantages of our model in reconstructing missing markers and achieving comparable results on generating realistic future poses. When the inputs are imperfect, our model shows improvements on robustness of generation.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2021
Serie
IEEE RO-MAN, ISSN 1944-9445
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:kth:diva-305502 (URN)10.1109/RO-MAN50785.2021.9515316 (DOI)000709817200093 ()2-s2.0-85115049506 (Scopus ID)
Konferens
30th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 08-12, 2021, ELECTR NETWORK
Anmärkning

QC 20211201

Part of proceedings: ISBN 978-1-6654-0492-1

Tillgänglig från: 2021-12-01 Skapad: 2021-12-01 Senast uppdaterad: 2024-01-17Bibliografiskt granskad
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-7189-1336

Sök vidare i DiVA

Visa alla publikationer