kth.sePublikationer
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Publikationer (10 of 124) Visa alla publikationer
Hammerfald, K., Schmidt, F., Vlassov, V., Haaland Jahren, H. & Solbakken, O. A. (2025). Leveraging large language models to identify microcounseling skills in psychotherapy transcripts. Psychotherapy Research, 1-19
Öppna denna publikation i ny flik eller fönster >>Leveraging large language models to identify microcounseling skills in psychotherapy transcripts
Visa övriga...
2025 (Engelska)Ingår i: Psychotherapy Research, ISSN 1050-3307, E-ISSN 1468-4381, s. 1-19Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Objective: Microcounseling skills are fundamental to effective psychotherapy, yet manual coding is time- and resource-intensive. This study explores the potential of large language models (LLMs) to automate the identification of these skills in therapy sessions. Method: We fine-tuned GPT-4.1 on a set of psychotherapy transcripts annotated by human coders. The model was trained to classify therapist utterances, generate explanations for its decisions, and propose alternative responses. The pipeline included transcript preprocessing, dialogue segmentation, and supervised fine-tuning. Results: The model achieved solid performance (Accuracy: 0.78; Precision: 0.79; Recall: 0.78; F1: 0.78; Specificity: 0.77; Cohen's κ: 0.69). It reliably detected common and structurally distinct skills but struggled with more nuanced skills that rely on understanding implicit relational dynamics. Conclusion: Despite limitations, fine-tuned LLMs have potential for enhancing psychotherapy research and clinical practice by providing scalable, automated coding of therapist skills.

Ort, förlag, år, upplaga, sidor
Informa UK Limited, 2025
Nyckelord
artificial intelligence, counseling skills, large language models, machine learning, natural language processing
Nationell ämneskategori
Tillämpad psykologi Språkbehandling och datorlingvistik
Identifikatorer
urn:nbn:se:kth:diva-369938 (URN)10.1080/10503307.2025.2539405 (DOI)001550802700001 ()40817802 (PubMedID)2-s2.0-105013461117 (Scopus ID)
Anmärkning

QC 20250918

Tillgänglig från: 2025-09-18 Skapad: 2025-09-18 Senast uppdaterad: 2025-09-18Bibliografiskt granskad
Sheikholeslami, S., Ghasemirahni, H., Payberah, A. H., Wang, T., Dowling, J. & Vlassov, V. (2025). Utilizing Large Language Models for Ablation Studies in Machine Learning and Deep Learning. In: : . Paper presented at The 5th Workshop on Machine Learning and Systems (EuroMLSys), co-located with the 20th European Conference on Computer Systems (EuroSys). ACM Digital Library
Öppna denna publikation i ny flik eller fönster >>Utilizing Large Language Models for Ablation Studies in Machine Learning and Deep Learning
Visa övriga...
2025 (Engelska)Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In Machine Learning (ML) and Deep Learning (DL) research, ablation studies are typically performed to provide insights into the individual contribution of different building blocks and components of an ML/DL system (e.g., a deep neural network), as well as to justify that certain additions or modifications to an existing ML/DL system can result in the proposed improved performance. Although dedicated frameworks for performing ablation studies have been introduced in recent years, conducting such experiments is still associated with requiring tedious, redundant work, typically involving maintaining redundant and nearly identical versions of code that correspond to different ablation trials. Inspired by the recent promising performance of Large Language Models (LLMs) in the generation and analysis of ML/DL code, in this paper we discuss the potential of LLMs as facilitators of ablation study experiments for scientific research projects that involve or deal with ML and DL models. We first discuss the different ways in which LLMs can be utilized for ablation studies and then present the prototype of a tool called AblationMage, that leverages LLMs to semi-automate the overall process of conducting ablation study experiments. We showcase the usability of AblationMage as a tool through three experiments, including one in which we reproduce the ablation studies from a recently published applied DL paper.

Ort, förlag, år, upplaga, sidor
ACM Digital Library, 2025
Nyckelord
Ablation Studies, Deep Learning, Feature Ablation, Model Ablation, Large Language Models
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datalogi; Datalogi
Identifikatorer
urn:nbn:se:kth:diva-360719 (URN)10.1145/3721146.3721957 (DOI)001477868300025 ()2-s2.0-105003634645 (Scopus ID)
Konferens
The 5th Workshop on Machine Learning and Systems (EuroMLSys), co-located with the 20th European Conference on Computer Systems (EuroSys)
Forskningsfinansiär
Vinnova, 2016–05193
Anmärkning

QC 20250303

Tillgänglig från: 2025-02-28 Skapad: 2025-02-28 Senast uppdaterad: 2025-07-01
Schmidt, F., Kurzawski, M. G., Hammerfald, K., Jahren, H. H., Solbakken, O. A. & Vlassov, V. (2024). A Scalable System Architecture for Composition and Deployment of Machine Learning Models in Cognitive Behavioral Therapy. In: 2024 IEEE International Conference on Digital Health (ICDH): . Paper presented at 2024 IEEE International Conference on Digital Health (ICDH), Shenzhen, China, 07-13 July 2024 (pp. 79-86). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>A Scalable System Architecture for Composition and Deployment of Machine Learning Models in Cognitive Behavioral Therapy
Visa övriga...
2024 (Engelska)Ingår i: 2024 IEEE International Conference on Digital Health (ICDH), Institute of Electrical and Electronics Engineers (IEEE), 2024, s. 79-86Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Machine learning (ML) models are a valuable tool for decision support in internet-delivered cognitive behavioral therapy (iCBT). However, while the literature extensively covers model development, a gap exists in the practical deployment of these models. This work proposes a novel system architecture to efficiently compose and deploy an ensemble of ML models tailored for iCBT in the cloud. We first establish system requirements and evaluation metrics based on the iCBT workflow and derive the system architecture based on these. We develop and implement a prototype of the system architecture for the composition and deployment of ML models in iCBT and validate the prototype with representative data through unit and integration tests. The results of the conceptual validation show that the prototype successfully facilitates the deployment of the models per the desired system requirements. Finally, we outline a path to a scalable deployment.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2024
Nyckelord
Measurement, Systems architecture, Prototypes, Medical treatment, Machine learning, Data models, Electronic healthcare, scalable machine learning, internet-delivered cognitive behavioral therapy, iCBT, clinical decision support system, CDSS
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-352649 (URN)10.1109/ICDH62654.2024.00024 (DOI)001308534900012 ()2-s2.0-85203816332 (Scopus ID)
Konferens
2024 IEEE International Conference on Digital Health (ICDH), Shenzhen, China, 07-13 July 2024
Projekt
ALEC-2
Anmärkning

Part of ISBN 979-8-3503-6857-4

QC 20240906

Tillgänglig från: 2024-09-04 Skapad: 2024-09-04 Senast uppdaterad: 2024-11-05Bibliografiskt granskad
Xu, Z., Nordström, P., Sheikholeslami, S., Al-Shishtawy, A. & Vlassov, V. (2024). A Semi-Supervised Model for Non-Cellular Elements Segmentation in Microscopy Images of Wood. In: 2024 IEEE International Conference on Big Data (BigData): . Paper presented at IEEE International Conference on Big Data, Washington DC, USA, Dec 15 - Dec 18, 2024 (pp. 2049-2056). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>A Semi-Supervised Model for Non-Cellular Elements Segmentation in Microscopy Images of Wood
Visa övriga...
2024 (Engelska)Ingår i: 2024 IEEE International Conference on Big Data (BigData), Institute of Electrical and Electronics Engineers (IEEE) , 2024, s. 2049-2056Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In wood science, accurate segmentation of non-cellular elements in microscopy images is critical for assessing wood quality and understanding growth patterns. Yet, it is challenging due to the complex morphology of wood components. This work explores the development of a semi-supervised deep learning model for segmenting non-cellular elements in wood microscopy images of Norway spruce, an essential source for construction materials in Europe, addressing the challenge of manual annotation’s labor intensity and expertise requirement. The segmentation model employs advanced deep learning architectures, including Convolutional Neural Networks and a Vision Transformer, to capture the intrinsic patterns embedded in wood structures. We proposed a Pixel-level Guided Mean-Teacher (PG-MT) framework as an improvement to the Mean-Teacher semi-supervised learning technique. Our framework enables pixel-level guided correction to enhance segmentation accuracy and model robustness with limited labeled datasets. Our experimental evaluations show that the proposed PG-MT framework improved the Dice score for medullary ray segmentation by 0.95% and the IoU score by 1.14% over the Uncertainty-Aware Mean-Teacher (UA-MT) framework. Additionally, the integration with laboratory instruments emphasizes the model’s effectiveness in accurately estimating cross-sectional cell wall thickness, demonstrating a strong correlation with X-ray measurements. This result validates the model’s practical applicability in laboratory settings, enhancing the analysis of wood properties. This work provides a robust semi-supervised DL framework for segmenting non-cellular elements in wood microscopy images, significantly reducing the annotation burden and paving the way for more automated and precise wood property analysis.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2024
Nyckelord
Computer vision, Deep learning, Microscopy image segmentation, Semi-supervised learning, Mean Teacher, Wood science
Nationell ämneskategori
Datavetenskap (datalogi) Trävetenskap
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-358847 (URN)10.1109/BigData62323.2024.10825915 (DOI)2-s2.0-85218026779 (Scopus ID)
Konferens
IEEE International Conference on Big Data, Washington DC, USA, Dec 15 - Dec 18, 2024
Forskningsfinansiär
Bio4EnergyVinnova, 2016–05193
Anmärkning

Part of ISBN 979-8-3503-6248-0

QC 20250122

Tillgänglig från: 2025-01-21 Skapad: 2025-01-21 Senast uppdaterad: 2025-03-13Bibliografiskt granskad
Sheikholeslami, S., Wang, T., Payberah, A. H., Dowling, J. & Vlassov, V. (2024). Deep Neural Network Weight Initialization from Hyperparameter Tuning Trials. In: Neural Information Processing: . Paper presented at ICONIP: International Conference on Neural Information Processing, December 2-6, Auckland, New Zeeland. Springer Nature
Öppna denna publikation i ny flik eller fönster >>Deep Neural Network Weight Initialization from Hyperparameter Tuning Trials
Visa övriga...
2024 (Engelska)Ingår i: Neural Information Processing, Springer Nature , 2024Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Training of deep neural networks from scratch requires initialization of the neural network weights as a first step. Over the years, many policies and techniques for weight initialization have been proposed and widely used, including Kaiming initialization and different variants of random initialization. On the other hand, another requirement for starting the training stage is to choose and set suitable hyperparameter values, which are usually obtained by performing several hyperparameter tuning trials. In this paper, we study the suitability of weight initialization using weights obtained from different epochs of hyperparameter tuning trials and compare it to Kaiming uniform (random) weight initialization for image classification tasks. Based on an experimental evaluation using ResNet-18, ResNet-152, and InceptionV3 models, and CIFAR-10, CIFAR-100, Tiny ImageNet, and Food-101 datasets, we show that weight initialization from hyperparameter tuning trials can speed up the training of deep neural networks by up to 2x while maintaining or improving the best test accuracy of the trained models, when compared to random initialization.

Ort, förlag, år, upplaga, sidor
Springer Nature, 2024
Nyckelord
weight initialization, deep neural network training, hyperparameter tuning, model training, deep learning
Nationell ämneskategori
Datavetenskap (datalogi) Artificiell intelligens
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-358848 (URN)10.1007/978-981-96-6954-7_5 (DOI)
Konferens
ICONIP: International Conference on Neural Information Processing, December 2-6, Auckland, New Zeeland
Anmärkning

QC 20250303

Tillgänglig från: 2025-02-28 Skapad: 2025-02-28 Senast uppdaterad: 2025-07-01Bibliografiskt granskad
Johannesson, T., Rubensson, I., Sheikholeslami, S., Al-Shishtawy, A. & Vlassov, V. (2024). DUGET: Leveraging Machine Learning for Dynamic User Grouping and Evolution Tracking in Public Transit Systems. In: Proceedings 2024 IEEE International Conference on Big Data (BigData): . Paper presented at IEEE International Conference on Big Data, Washington DC, USA, 15-18 December, 2024 (pp. 1785-1794). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>DUGET: Leveraging Machine Learning for Dynamic User Grouping and Evolution Tracking in Public Transit Systems
Visa övriga...
2024 (Engelska)Ingår i: Proceedings 2024 IEEE International Conference on Big Data (BigData), Institute of Electrical and Electronics Engineers (IEEE) , 2024, s. 1785-1794Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This work aims to explore the use of machine learning techniques, particularly clustering and cluster evolution tracking, to analyze travel patterns in public transportation in a city and provide valuable insights for urban transit planning and optimization. Clustering involves identifying and grouping similar objects, such as passengers with different ticket types, and distinguishing them from dissimilar objects in other groups. Over time, groups can change, so tracking this change can provide more detailed and valuable insights than analyzing data in aggregates. Clustering and cluster evolution tracking can reveal groups of passengers that are more or less affected by changes such as seasonality or fare increases. We propose a framework called DUGET (Dynamic User Grouping and Evolution Tracking), which clusters anonymized users based on their ticket choices and temporal travel patterns using a multi-step approach. The clusters are then tracked over time using Jaccard similarity based on memberships, allowing for the analysis and visualization of changes. Our experiments using a real-world public transportation dataset collected in Stockholm, Sweden, show the feasibility of tracking change over time in public transportation by examining passenger behavior as a temporal aggregate. The framework we propose is generalizable and can be used for future projects to understand trends in groups of objects.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2024
Nyckelord
Smart card data, Temporal patterns, Clustering, Customer segmentation, Public transportation, Machine learning
Nationell ämneskategori
Datavetenskap (datalogi) Transportteknik och logistik
Forskningsämne
Datalogi; Planering och beslutsanalys, Urbana och regionala studier; Transportvetenskap
Identifikatorer
urn:nbn:se:kth:diva-358846 (URN)10.1109/BigData62323.2024.10825688 (DOI)2-s2.0-85218050805 (Scopus ID)
Konferens
IEEE International Conference on Big Data, Washington DC, USA, 15-18 December, 2024
Anmärkning

Part of ISBN 979-8-3503-6248-0

QC 20250122

Tillgänglig från: 2025-01-21 Skapad: 2025-01-21 Senast uppdaterad: 2025-02-26Bibliografiskt granskad
Rauniyar, A., Hagos, D. H., Jha, D., Håkegård, J. E., Bagci, U., Rawat, D. B. & Vlassov, V. (2024). Federated Learning for Medical Applications: A Taxonomy, Current Trends, Challenges, and Future Research Directions. IEEE Internet of Things Journal, 11(5), 7374-7398
Öppna denna publikation i ny flik eller fönster >>Federated Learning for Medical Applications: A Taxonomy, Current Trends, Challenges, and Future Research Directions
Visa övriga...
2024 (Engelska)Ingår i: IEEE Internet of Things Journal, ISSN 2327-4662, Vol. 11, nr 5, s. 7374-7398Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

With the advent of the Internet of Things (IoT), artificial intelligence (AI), machine learning (ML), and deep learning (DL) algorithms, the landscape of data-driven medical applications has emerged as a promising avenue for designing robust and scalable diagnostic and prognostic models from medical data. This has gained a lot of attention from both academia and industry, leading to significant improvements in healthcare quality. However, the adoption of AI-driven medical applications still faces tough challenges, including meeting security, privacy, and Quality-of-Service (QoS) standards. Recent developments in federated learning (FL) have made it possible to train complex machine-learned models in a distributed manner and have become an active research domain, particularly processing the medical data at the edge of the network in a decentralized way to preserve privacy and address security concerns. To this end, in this article, we explore the present and future of FL technology in medical applications where data sharing is a significant challenge. We delve into the current research trends and their outcomes, unraveling the complexities of designing reliable and scalable FL models. This article outlines the fundamental statistical issues in FL, tackles device-related problems, addresses security challenges, and navigates the complexity of privacy concerns, all while highlighting its transformative potential in the medical field. Our study primarily focuses on medical applications of FL, particularly in the context of global cancer diagnosis. We highlight the potential of FL to enable computer-aided diagnosis tools that address this challenge with greater effectiveness than traditional data-driven methods. Recent literature has shown that FL models are robust and generalize well to new data, which is essential for medical applications. We hope that this comprehensive review will serve as a checkpoint for the field, summarizing the current state of the art and identifying open problems and future research directions.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2024
Nyckelord
Medical services;Medical diagnostic imaging;Biomedical equipment;Data privacy;Surveys;Internet of Things;Cancer;Artificial intelligence (AI);communication;data privacy;edge computing;federated learning (FL);foundational model (FMs);large language model (LLM);medical applications;security
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:kth:diva-345592 (URN)10.1109/JIOT.2023.3329061 (DOI)001203463700006 ()2-s2.0-85181574699 (Scopus ID)
Anmärkning

QC 20240415

Tillgänglig från: 2024-04-12 Skapad: 2024-04-12 Senast uppdaterad: 2024-07-02Bibliografiskt granskad
Krylova, S., Schmidt, F. & Vlassov, V. (2024). Leveraging Machine Learning Models to Predict the Outcome of Digital Medical Triage Interviews. In: Proceedings - 2024 International Conference on Machine Learning and Applications, ICMLA 2024: . Paper presented at 23rd IEEE International Conference on Machine Learning and Applications, ICMLA 2024, Miami, United States of America, December 18-20, 2024 (pp. 160-167). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Leveraging Machine Learning Models to Predict the Outcome of Digital Medical Triage Interviews
2024 (Engelska)Ingår i: Proceedings - 2024 International Conference on Machine Learning and Applications, ICMLA 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, s. 160-167Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

One of the key advances in digital healthcare is the implementation of digital triage, which, using online tools, web, and mobile apps, allows for efficient assessment of patient needs, prioritizing cases, and directing them to appropriate healthcare services. Many existing digital triage systems are questionnaire-based, guiding patients to appropriate care levels based on infor-mation (e.g., symptoms, medical history, and urgency) provided by the patients answering questionnaires. Such a system often uses a deterministic model with predefined rules to determine care levels. It faces challenges with incomplete triage interviews since it can only assist patients who finish the process. In this study, we explore the use of machine learning (ML) to predict outcomes of unfinished interviews, aiming to enhance patient care and service quality. Predicting triage outcomes from incomplete data is crucial for patient safety and healthcare efficiency. Our findings show that decision-tree models, particularly LGBMClassifier and CatBoostClassifier, achieve over 80% ac-curacy in predicting outcomes from complete interviews while having a linear correlation between the prediction accuracy and interview completeness degree. For example, LGBMClassifier achieves 88,2 % prediction accuracy for interviews with 100 % completeness, 79,6% accuracy for interviews with 80% complete-ness, 58,9 % accuracy for 60 % completeness, and 45,7% accuracy for 40% completeness. The Tab Transformer model demonstrated exceptional accuracy of over 80 % for all degrees of completeness but required extensive training time, indicating a need for more powerful computational resources. The study highlights the linear correlation between interview completeness and predictive power of the decision-tree models.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2024
Nyckelord
digital health, digital triage, machine learning classification
Nationell ämneskategori
Datorsystem
Identifikatorer
urn:nbn:se:kth:diva-361974 (URN)10.1109/ICMLA61862.2024.00028 (DOI)2-s2.0-105001043732 (Scopus ID)
Konferens
23rd IEEE International Conference on Machine Learning and Applications, ICMLA 2024, Miami, United States of America, December 18-20, 2024
Anmärkning

Part of ISBN 9798350374889

QC 20250404

Tillgänglig från: 2025-04-03 Skapad: 2025-04-03 Senast uppdaterad: 2025-04-04Bibliografiskt granskad
Insalata, B., Schmidt, F. & Vlassov, V. (2024). Multimodal survival prediction using TabTransformer and BioClinicalBERT on MIMIC-III. In: Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024: . Paper presented at 2024 IEEE International Conference on Big Data, BigData 2024, Washington, United States of America, December 15-18, 2024 (pp. 1986-1992). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Multimodal survival prediction using TabTransformer and BioClinicalBERT on MIMIC-III
2024 (Engelska)Ingår i: Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, s. 1986-1992Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This paper explores the development and evaluation of a multimodal system for survival prediction in clinical settings, leveraging both structured electronic health records and unstructured clinical notes. The core objective is to enhance the accuracy and reliability of survival predictions in Intensive Care Units by integrating diverse data types through advanced machine learning models. The system combines the novel architecture of Tabular Transformers, adapted to process structured data such as patient demographics, medical history, and diagnoses, with Multi-Layer Perceptrons for text embeddings obtained from BioClinicalBERT, a specialized model for clinical narratives. These models' integration aims to capture the complex and multifaceted nature of patient profiles, thereby improving prediction performance. The finalized system, a Logistic Regression instance that aggregates the obtained predictions, demonstrates superior performance on evaluation metrics, highlighting the system's ability to identify high-risk patients. Comprehensive benchmarking against decision trees, standalone MLPs, and various configurations underscored the robustness of the proposed system. This research highlights the transformative potential of multimodal data integration in medical predictive modeling. Thus, medical professionals can guide their efforts and prioritization of patient care, enabling more efficient and targeted allocation of resources during triage.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2024
Nyckelord
Clinical Notes, Electronic Health Records, Multimodality, Survival Prediction, Tabular Transformers
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:kth:diva-360559 (URN)10.1109/BigData62323.2024.10826011 (DOI)2-s2.0-85218041072 (Scopus ID)
Konferens
2024 IEEE International Conference on Big Data, BigData 2024, Washington, United States of America, December 15-18, 2024
Anmärkning

Part of ISBN 9798350362480

QC 20250226

Tillgänglig från: 2025-02-26 Skapad: 2025-02-26 Senast uppdaterad: 2025-02-26Bibliografiskt granskad
Schmidt, F., Hammerfald, K., Jahren, H. H., Payberah, A. H. & Vlassov, V. (2024). Single-pass Hierarchical Text Classification with Large Language Models. In: Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024: . Paper presented at 2024 IEEE International Conference on Big Data, BigData 2024, Washington, United States of America, Dec 15 2024 - Dec 18 2024 (pp. 5412-5421). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Single-pass Hierarchical Text Classification with Large Language Models
Visa övriga...
2024 (Engelska)Ingår i: Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, s. 5412-5421Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Numerous text classification tasks inherently possess hierarchical structures among classes, often overlooked in traditional classification paradigms. This study introduces novel approaches for hierarchical text classification using Large Language Models (LLMs), exploiting taxonomies to improve accuracy and traceability in a zero-shot setting. We propose two hierarchical classification methods, namely (i) single-path and (ii) path-traversal, which all leverage the hierarchical class structures inherent in the target classes (e.g., a bird is a type of animal that belongs to a species) and improve naïve hierarchical text classification from literature. We implement them as prompts for generative models such as OpenAI GPTs and benchmark them against discriminative language models (BERT and RoBERTa). We measure the classification performance (precision, recall, and F1-score) vs. computational efficiency (time and cost). Throughout the evaluations of the classification methods on two diverse datasets, namely ComFaSyn, containing mental health patients' diary entries, and DBpedia, containing structured information extracted from Wikipedia, we observed that our methods, without any form of fine-tuning and few-shot examples, achieve comparable results to flat classification and existing methods from literature with minimal increases in the prompts and processing time.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2024
Nyckelord
Hierarchical text classification, Large Language Models (LLMs), zero-shot classification
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
urn:nbn:se:kth:diva-360563 (URN)10.1109/BigData62323.2024.10825412 (DOI)2-s2.0-85218008858 (Scopus ID)
Konferens
2024 IEEE International Conference on Big Data, BigData 2024, Washington, United States of America, Dec 15 2024 - Dec 18 2024
Anmärkning

Part of ISBN 9798350362480

QC 20250226

Tillgänglig från: 2025-02-26 Skapad: 2025-02-26 Senast uppdaterad: 2025-02-26Bibliografiskt granskad
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-6779-7435

Sök vidare i DiVA

Visa alla publikationer