kth.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 50) Show all publications
Sheikholeslami, S., Ghasemirahni, H., Payberah, A. H., Wang, T., Dowling, J. & Vlassov, V. (2025). Utilizing Large Language Models for Ablation Studies in Machine Learning and Deep Learning. In: : . Paper presented at The 5th Workshop on Machine Learning and Systems (EuroMLSys), co-located with the 20th European Conference on Computer Systems (EuroSys). ACM Digital Library
Open this publication in new window or tab >>Utilizing Large Language Models for Ablation Studies in Machine Learning and Deep Learning
Show others...
2025 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In Machine Learning (ML) and Deep Learning (DL) research, ablation studies are typically performed to provide insights into the individual contribution of different building blocks and components of an ML/DL system (e.g., a deep neural network), as well as to justify that certain additions or modifications to an existing ML/DL system can result in the proposed improved performance. Although dedicated frameworks for performing ablation studies have been introduced in recent years, conducting such experiments is still associated with requiring tedious, redundant work, typically involving maintaining redundant and nearly identical versions of code that correspond to different ablation trials. Inspired by the recent promising performance of Large Language Models (LLMs) in the generation and analysis of ML/DL code, in this paper we discuss the potential of LLMs as facilitators of ablation study experiments for scientific research projects that involve or deal with ML and DL models. We first discuss the different ways in which LLMs can be utilized for ablation studies and then present the prototype of a tool called AblationMage, that leverages LLMs to semi-automate the overall process of conducting ablation study experiments. We showcase the usability of AblationMage as a tool through three experiments, including one in which we reproduce the ablation studies from a recently published applied DL paper.

Place, publisher, year, edition, pages
ACM Digital Library, 2025
Keywords
Ablation Studies, Deep Learning, Feature Ablation, Model Ablation, Large Language Models
National Category
Computer Sciences
Research subject
Computer Science; Computer Science
Identifiers
urn:nbn:se:kth:diva-360719 (URN)10.1145/3721146.3721957 (DOI)001477868300025 ()2-s2.0-105003634645 (Scopus ID)
Conference
The 5th Workshop on Machine Learning and Systems (EuroMLSys), co-located with the 20th European Conference on Computer Systems (EuroSys)
Funder
Vinnova, 2016–05193
Note

QC 20250303

Available from: 2025-02-28 Created: 2025-02-28 Last updated: 2025-07-01
Karimi, S., Asadi, S. & Payberah, A. H. (2024). BaziGooshi: A Hybrid Model of Reinforcement Learning for Generalization in Gameplay. IEEE Transactions on Games, 16(3), 722-734
Open this publication in new window or tab >>BaziGooshi: A Hybrid Model of Reinforcement Learning for Generalization in Gameplay
2024 (English)In: IEEE Transactions on Games, ISSN 2475-1502, E-ISSN 2475-1510, Vol. 16, no 3, p. 722-734Article in journal (Refereed) Published
Abstract [en]

While reinforcement learning (RL) is gaining popularity in gameplay, creating a generalized RL model is still challenging. This study presents BaziGooshi, a generalized RL solution for games, focusing on two different types of games: 1) a puzzle game Candy Crush Friends Saga and 2) a platform game Sonic the Hedgehog Genesis. BaziGooshi rewards RL agents for mastering a set of intrinsic basic skills as well as achieving the game objectives. The solution includes a hybrid model that takes advantage of a combination of several agents pretrained using intrinsic or extrinsic rewards to determine the actions. We propose an RL-based method for assigning weights to the pretrained agents. Through experiments, we show that the RL-based approach improves generalization to unseen levels, and BaziGooshi surpasses the performance of most of the defined baselines in both games. Also, we perform additional experiments to investigate further the impacts of using intrinsic rewards and the effects of using different combinations in the proposed hybrid models.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Games, Color, Training, Green products, Encoding, Shape, Reinforcement learning, Deep reinforcement learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-355137 (URN)10.1109/TG.2024.3355172 (DOI)001319570900012 ()2-s2.0-85184322584 (Scopus ID)
Note

QC 20241023

Available from: 2024-10-23 Created: 2024-10-23 Last updated: 2025-08-28Bibliographically approved
Sheikholeslami, S., Wang, T., Payberah, A. H., Dowling, J. & Vlassov, V. (2024). Deep Neural Network Weight Initialization from Hyperparameter Tuning Trials. In: Neural Information Processing: . Paper presented at ICONIP: International Conference on Neural Information Processing, December 2-6, Auckland, New Zeeland. Springer Nature
Open this publication in new window or tab >>Deep Neural Network Weight Initialization from Hyperparameter Tuning Trials
Show others...
2024 (English)In: Neural Information Processing, Springer Nature , 2024Conference paper, Published paper (Refereed)
Abstract [en]

Training of deep neural networks from scratch requires initialization of the neural network weights as a first step. Over the years, many policies and techniques for weight initialization have been proposed and widely used, including Kaiming initialization and different variants of random initialization. On the other hand, another requirement for starting the training stage is to choose and set suitable hyperparameter values, which are usually obtained by performing several hyperparameter tuning trials. In this paper, we study the suitability of weight initialization using weights obtained from different epochs of hyperparameter tuning trials and compare it to Kaiming uniform (random) weight initialization for image classification tasks. Based on an experimental evaluation using ResNet-18, ResNet-152, and InceptionV3 models, and CIFAR-10, CIFAR-100, Tiny ImageNet, and Food-101 datasets, we show that weight initialization from hyperparameter tuning trials can speed up the training of deep neural networks by up to 2x while maintaining or improving the best test accuracy of the trained models, when compared to random initialization.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
weight initialization, deep neural network training, hyperparameter tuning, model training, deep learning
National Category
Computer Sciences Artificial Intelligence
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-358848 (URN)10.1007/978-981-96-6954-7_5 (DOI)
Conference
ICONIP: International Conference on Neural Information Processing, December 2-6, Auckland, New Zeeland
Note

QC 20250303

Available from: 2025-02-28 Created: 2025-02-28 Last updated: 2025-07-01Bibliographically approved
Pena, F. J., Hübinger, C., Payberah, A. H. & Jaramillo, F. (2024). DEEPAQUA: Semantic segmentation of wetland water surfaces with SAR imagery using deep neural networks without manually annotated data. International Journal of Applied Earth Observation and Geoinformation, 126, Article ID 103624.
Open this publication in new window or tab >>DEEPAQUA: Semantic segmentation of wetland water surfaces with SAR imagery using deep neural networks without manually annotated data
2024 (English)In: International Journal of Applied Earth Observation and Geoinformation, ISSN 1569-8432, E-ISSN 1872-826X, Vol. 126, article id 103624Article in journal (Refereed) Published
Abstract [en]

Deep learning and remote sensing techniques have significantly advanced water surface monitoring; however, the need for annotated data remains a challenge. This is particularly problematic in wetland detection, where water extent varies over time and space, demanding multiple annotations for the same area. In this paper, we present DEEPAQUA, a deep learning model inspired by knowledge distillation (a.k.a. teacher–student model) to generate labeled data automatically and eliminate the need for manual annotations during the training phase. We utilize the Normalized Difference Water Index (NDWI) as a teacher model to train a Convolutional Neural Network (CNN) for segmenting water from Synthetic Aperture Radar (SAR) images. To train the student model, we exploit cases where optical- and radar-based water masks coincide, enabling the detection of both open and vegetated water surfaces. DEEPAQUA represents a significant advancement in computer vision techniques for water detection by effectively training semantic segmentation models without any manually annotated data. Experimental results show that DEEPAQUA outperforms other unsupervised methods by improving accuracy by 3%, Intersection Over Union by 11%, and F1-score by 6%. This approach offers a practical solution for monitoring wetland water extent changes without the need of ground truth data, making it highly adaptable and scalable for wetland monitoring.

Place, publisher, year, edition, pages
Elsevier BV, 2024
Keywords
Automated data labeling, CNN, Deep learning, Remote sensing, Semantic segmentation, Vegetated water, Wetland mapping
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-341922 (URN)10.1016/j.jag.2023.103624 (DOI)001142090200001 ()2-s2.0-85180567035 (Scopus ID)
Note

QC 20240108

Available from: 2024-01-08 Created: 2024-01-08 Last updated: 2025-02-07Bibliographically approved
Kamalian, M., Taherkordi, A., Payberah, A. H. & Ferreira, P. (2024). FogFLeeT: Fog-Level Federated Transfer Learning for Adaptive Transport Mode Detection. In: Proceedings - 2024 IEEE International Conference on Cloud Engineering, IC2E 2024: . Paper presented at 12th IEEE International Conference on Cloud Engineering, IC2E 2024, Paphos, Cyprus, September 24-27, 2024 (pp. 22-33). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>FogFLeeT: Fog-Level Federated Transfer Learning for Adaptive Transport Mode Detection
2024 (English)In: Proceedings - 2024 IEEE International Conference on Cloud Engineering, IC2E 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 22-33Conference paper, Published paper (Refereed)
Abstract [en]

Transport Mode Detection (TMD) systems play a pivotal role in facilitating applications in transport, urban planning, and more. Exploiting the advancements in smartphone sensing capabilities, TMD systems have evolved for mobile applications with local classification on smartphones as a common approach. Yet, local approaches relying on centralized training raise privacy concerns due to the transmission of sensitive data (e.g., GPS logs) over the Internet. In this paper, we propose FogFLeeT, a novel Federated Transfer Learning (FTL) framework for TMD, addressing both privacy and performance concerns. Our approach relies on Federated Learning (FL) to train a global model on various datasets from different cities while employing transfer learning to adapt the global model to the specific characteristics of individual smartphones and cities. FogFLeeT relies on an architecture that integrates edge, fog, and cloud layers, with dedicated fog nodes for each city to simplify cross-silo federated learning. Experimental results demonstrate the effectiveness of the FogFLeeT framework in higher TMD accuracy by up to 20% than its comparable centralized approach. Furthermore, it outperforms the FL solutions reported in the literature with at least an 8% increase in accuracy. In this work, we also highlight the importance of sufficient training data for distributed training and discuss the impact of smartphone sensor qualities on the performance of TMD systems. Our work contributes to advancing TMD systems by providing an adaptive and privacy-preserving solution suitable for deployment in diverse urban environments and across various geographical locations.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Accuracy, Edge Computing, Federated Learning, Fog Computing, Privacy, Transfer Learning, Transport Mode Detection
National Category
Computer Sciences Communication Systems Transport Systems and Logistics Computer Systems
Identifiers
urn:nbn:se:kth:diva-358132 (URN)10.1109/IC2E61754.2024.00010 (DOI)001438027700003 ()2-s2.0-85212176103 (Scopus ID)
Conference
12th IEEE International Conference on Cloud Engineering, IC2E 2024, Paphos, Cyprus, September 24-27, 2024
Note

Part of ISBN 9798331528690

QC 20250116

Available from: 2025-01-07 Created: 2025-01-07 Last updated: 2025-05-05Bibliographically approved
Layegh, A., Payberah, A. H. & Matskin, M. (2024). REA: Refine-Estimate-Answer Prompting for Zero-Shot Relation Extraction. In: Natural Language Processing and Information Systems - 29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Proceedings: . Paper presented at 29th International Conference on Natural Language and Information Systems, NLDB 2024, Turin, Italy, Jun 25 2024 - Jun 27 2024 (pp. 301-316). Springer Nature
Open this publication in new window or tab >>REA: Refine-Estimate-Answer Prompting for Zero-Shot Relation Extraction
2024 (English)In: Natural Language Processing and Information Systems - 29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Proceedings, Springer Nature , 2024, p. 301-316Conference paper, Published paper (Refereed)
Abstract [en]

Zero-shot relation extraction (RE) presents the challenge of identifying entity relationships from text without training on those specific relations. Despite significant advancements in natural language processing by applying large language models (LLMs), their application to zero-shot RE remains less effective compared to traditional models that fine-tune smaller pre-trained language models. This limitation is attributed to insufficient prompting strategies that fail to leverage the full capabilities of LLMs for zero-shot RE, considering the intrinsic complexities of the RE task. A compelling question is whether LLMs can address complex tasks, such as RE, by decomposing them into more straightforward, distinct tasks that are easier to manage and solve individually. We propose the Refine-Estimate-Answer (REA) approach to answer this question. This multi-stage prompting strategy of REA decomposes the RE task into more manageable subtasks and applies iterative refinement to guide LLMs through the complex reasoning required for accurate RE. Our research validates the effectiveness of REA through comprehensive testing across multiple public RE datasets, demonstrating marked improvements over existing LLM-based frameworks. Experimental results on the FewRel, Wiki-ZSL, and TACRED datasets show that our proposed approach significantly boosts the vanilla prompting F1 scores by 31.57, 19.52, and 15.39, respectively, thereby outperforming the performance of state-of-the-art LLM-based methods.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
Large Language Models, Prompting Strategy, Relation Extraction
National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-354657 (URN)10.1007/978-3-031-70239-6_21 (DOI)2-s2.0-85205393225 (Scopus ID)
Conference
29th International Conference on Natural Language and Information Systems, NLDB 2024, Turin, Italy, Jun 25 2024 - Jun 27 2024
Note

Part of ISBN 9783031702389]

QC 20241010

Available from: 2024-10-09 Created: 2024-10-09 Last updated: 2025-02-07Bibliographically approved
Karimi, S., Asadi, S. & Payberah, A. H. (2024). SCORE: Skill-Conditioned Online Reinforcement Learning. In: : . Paper presented at 20th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2024, Lexington, United States of America, November 18-22, 2024 (pp. 189-198). Association for the Advancement of Artificial Intelligence (AAAI)
Open this publication in new window or tab >>SCORE: Skill-Conditioned Online Reinforcement Learning
2024 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Solving complex long-horizon tasks through Reinforcement Learning (RL) from scratch presents challenges related to efficient exploration. Two common approaches to reduce complexity and enhance exploration efficiency are (i) integrating learning-from-demonstration techniques with online RL, where the prior knowledge acquired from demonstrations is used to guide exploration, refine representations, or tailor reward functions, and (ii) using representation learning to facilitate state abstraction. In this study, we present Skill-Conditioned Online REinforcement Learning (SCORE), a novel approach that leverages these two strategies and utilizes skills acquired from an unstructured demonstrations dataset in a policy gradient RL algorithm. This integration enriches the algorithm with informative input representations, improving downstream task learning and exploration efficiency. We evaluate our method on long-horizon robotic and navigation tasks and game environments, demonstrating enhancements in online RL performance compared to the baselines. Furthermore, we show our approach’s generalization capabilities and analyze its effectiveness through an ablation study.

Place, publisher, year, edition, pages
Association for the Advancement of Artificial Intelligence (AAAI), 2024
National Category
Computer Sciences Robotics and automation Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-358212 (URN)10.1609/aiide.v20i1.31879 (DOI)2-s2.0-85213057195 (Scopus ID)
Conference
20th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2024, Lexington, United States of America, November 18-22, 2024
Note

Part of ISBN 978-1-57735-895-4

QC 20250116

Available from: 2025-01-07 Created: 2025-01-07 Last updated: 2025-02-05Bibliographically approved
Schmidt, F., Hammerfald, K., Jahren, H. H., Payberah, A. H. & Vlassov, V. (2024). Single-pass Hierarchical Text Classification with Large Language Models. In: Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024: . Paper presented at 2024 IEEE International Conference on Big Data, BigData 2024, Washington, United States of America, Dec 15 2024 - Dec 18 2024 (pp. 5412-5421). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Single-pass Hierarchical Text Classification with Large Language Models
Show others...
2024 (English)In: Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 5412-5421Conference paper, Published paper (Refereed)
Abstract [en]

Numerous text classification tasks inherently possess hierarchical structures among classes, often overlooked in traditional classification paradigms. This study introduces novel approaches for hierarchical text classification using Large Language Models (LLMs), exploiting taxonomies to improve accuracy and traceability in a zero-shot setting. We propose two hierarchical classification methods, namely (i) single-path and (ii) path-traversal, which all leverage the hierarchical class structures inherent in the target classes (e.g., a bird is a type of animal that belongs to a species) and improve naïve hierarchical text classification from literature. We implement them as prompts for generative models such as OpenAI GPTs and benchmark them against discriminative language models (BERT and RoBERTa). We measure the classification performance (precision, recall, and F1-score) vs. computational efficiency (time and cost). Throughout the evaluations of the classification methods on two diverse datasets, namely ComFaSyn, containing mental health patients' diary entries, and DBpedia, containing structured information extracted from Wikipedia, we observed that our methods, without any form of fine-tuning and few-shot examples, achieve comparable results to flat classification and existing methods from literature with minimal increases in the prompts and processing time.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Hierarchical text classification, Large Language Models (LLMs), zero-shot classification
National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-360563 (URN)10.1109/BigData62323.2024.10825412 (DOI)2-s2.0-85218008858 (Scopus ID)
Conference
2024 IEEE International Conference on Big Data, BigData 2024, Washington, United States of America, Dec 15 2024 - Dec 18 2024
Note

Part of ISBN 9798350362480

QC 20250226

Available from: 2025-02-26 Created: 2025-02-26 Last updated: 2025-02-26Bibliographically approved
Layegh, A., Payberah, A. H., Soylu, A., Roman, D. & Matskin, M. (2024). Wiki-based Prompts for Enhancing Relation Extraction using Language Models. In: 39th Annual ACM Symposium on Applied Computing, SAC 2024: . Paper presented at 39th Annual ACM Symposium on Applied Computing, SAC 2024, Avila, Spain, Apr 8 2024 - Apr 12 2024 (pp. 731-740). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Wiki-based Prompts for Enhancing Relation Extraction using Language Models
Show others...
2024 (English)In: 39th Annual ACM Symposium on Applied Computing, SAC 2024, Association for Computing Machinery (ACM) , 2024, p. 731-740Conference paper, Published paper (Refereed)
Abstract [en]

Prompt-tuning and instruction-tuning of language models have exhibited significant results in few-shot Natural Language Processing (NLP) tasks, such as Relation Extraction (RE), which involves identifying relationships between entities within a sentence. However, the effectiveness of these methods relies heavily on the design of the prompts. A compelling question is whether incorporating external knowledge can enhance the language model's understanding of NLP tasks. In this paper, we introduce wiki-based prompt construction that leverages Wikidata as a source of information to craft more informative prompts for both prompt-tuning and instruction-tuning of language models in RE. Our experiments show that using wiki-based prompts enhances cutting-edge language models in RE, emphasizing their potential for improving RE tasks. Our code and datasets are available at GitHub 1

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
knowledge integration, language models, prompt construction, relation extraction
National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-350722 (URN)10.1145/3605098.3635949 (DOI)001236958200108 ()2-s2.0-85197687891 (Scopus ID)
Conference
39th Annual ACM Symposium on Applied Computing, SAC 2024, Avila, Spain, Apr 8 2024 - Apr 12 2024
Note

Part of ISBN 9798400702433

QC 20240719

Available from: 2024-07-17 Created: 2024-07-17 Last updated: 2025-02-07Bibliographically approved
Layegh, A., Payberah, A. H., Soylu, A., Roman, D. & Matskin, M. (2023). ContrastNER: Contrastive-based Prompt Tuning for Few-shot NER. In: Proceedings - 2023 IEEE 47th Annual Computers, Software, and Applications Conference, COMPSAC 2023: . Paper presented at 47th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2023, Jun 26 2023 - Jun 30 2023, Hybrid, Torino, Italy (pp. 241-249). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>ContrastNER: Contrastive-based Prompt Tuning for Few-shot NER
Show others...
2023 (English)In: Proceedings - 2023 IEEE 47th Annual Computers, Software, and Applications Conference, COMPSAC 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 241-249Conference paper, Published paper (Refereed)
Abstract [en]

Prompt-based language models have produced encouraging results in numerous applications, including Named Entity Recognition (NER) tasks. NER aims to identify entities in a sentence and provide their types. However, the strong performance of most available NER approaches is heavily dependent on the design of discrete prompts and a verbalizer to map the model-predicted outputs to entity categories, which are complicated undertakings. To address these challenges, we present ContrastNER, a prompt-based NER framework that employs both discrete and continuous tokens in prompts and uses a contrastive learning approach to learn the continuous prompts and forecast entity types. The experimental results demonstrate that ContrastNER obtains competitive performance to the state-of-the-art NER methods in high-resource settings and outperforms the state-of-the-art models in low-resource circumstances without requiring extensive manual prompt engineering and verbalizer design.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Contrastive learning, Language Models, Named Entity Recognition, Prompt-based learning
National Category
Natural Language Processing Robotics and automation
Identifiers
urn:nbn:se:kth:diva-336748 (URN)10.1109/COMPSAC57700.2023.00038 (DOI)001046484100028 ()2-s2.0-85168863373 (Scopus ID)
Conference
47th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2023, Jun 26 2023 - Jun 30 2023, Hybrid, Torino, Italy
Note

Part of proceedings ISBN 9798350326970

QC 20231031

Available from: 2023-09-19 Created: 2023-09-19 Last updated: 2025-02-05Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-2748-8929

Search in DiVA

Show all publications