Endre søk
Link to record
Permanent link

Direct link
Publikasjoner (10 av 43) Visa alla publikasjoner
Sheikholeslami, S., Ghasemirahni, H., Payberah, A. H., Wang, T., Dowling, J. & Vlassov, V. (2025). Utilizing Large Language Models for Ablation Studies in Machine Learning and Deep Learning. In: : . Paper presented at The 5th Workshop on Machine Learning and Systems (EuroMLSys), co-located with the 20th European Conference on Computer Systems (EuroSys). ACM Digital Library
Åpne denne publikasjonen i ny fane eller vindu >>Utilizing Large Language Models for Ablation Studies in Machine Learning and Deep Learning
Vise andre…
2025 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In Machine Learning (ML) and Deep Learning (DL) research, ablation studies are typically performed to provide insights into the individual contribution of different building blocks and components of an ML/DL system (e.g., a deep neural network), as well as to justify that certain additions or modifications to an existing ML/DL system can result in the proposed improved performance. Although dedicated frameworks for performing ablation studies have been introduced in recent years, conducting such experiments is still associated with requiring tedious, redundant work, typically involving maintaining redundant and nearly identical versions of code that correspond to different ablation trials. Inspired by the recent promising performance of Large Language Models (LLMs) in the generation and analysis of ML/DL code, in this paper we discuss the potential of LLMs as facilitators of ablation study experiments for scientific research projects that involve or deal with ML and DL models. We first discuss the different ways in which LLMs can be utilized for ablation studies and then present the prototype of a tool called AblationMage, that leverages LLMs to semi-automate the overall process of conducting ablation study experiments. We showcase the usability of AblationMage as a tool through three experiments, including one in which we reproduce the ablation studies from a recently published applied DL paper.

sted, utgiver, år, opplag, sider
ACM Digital Library, 2025
Emneord
Ablation Studies, Deep Learning, Feature Ablation, Model Ablation, Large Language Models
HSV kategori
Forskningsprogram
Datalogi; Datalogi
Identifikatorer
urn:nbn:se:kth:diva-360719 (URN)10.1145/3721146.3721957 (DOI)001477868300025 ()2-s2.0-105003634645 (Scopus ID)
Konferanse
The 5th Workshop on Machine Learning and Systems (EuroMLSys), co-located with the 20th European Conference on Computer Systems (EuroSys)
Forskningsfinansiär
Vinnova, 2016–05193
Merknad

QC 20250303

Tilgjengelig fra: 2025-02-28 Laget: 2025-02-28 Sist oppdatert: 2025-07-01
Sheikholeslami, S., Wang, T., Payberah, A. H., Dowling, J. & Vlassov, V. (2024). Deep Neural Network Weight Initialization from Hyperparameter Tuning Trials. In: Neural Information Processing: . Paper presented at ICONIP: International Conference on Neural Information Processing, December 2-6, Auckland, New Zeeland. Springer Nature
Åpne denne publikasjonen i ny fane eller vindu >>Deep Neural Network Weight Initialization from Hyperparameter Tuning Trials
Vise andre…
2024 (engelsk)Inngår i: Neural Information Processing, Springer Nature , 2024Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Training of deep neural networks from scratch requires initialization of the neural network weights as a first step. Over the years, many policies and techniques for weight initialization have been proposed and widely used, including Kaiming initialization and different variants of random initialization. On the other hand, another requirement for starting the training stage is to choose and set suitable hyperparameter values, which are usually obtained by performing several hyperparameter tuning trials. In this paper, we study the suitability of weight initialization using weights obtained from different epochs of hyperparameter tuning trials and compare it to Kaiming uniform (random) weight initialization for image classification tasks. Based on an experimental evaluation using ResNet-18, ResNet-152, and InceptionV3 models, and CIFAR-10, CIFAR-100, Tiny ImageNet, and Food-101 datasets, we show that weight initialization from hyperparameter tuning trials can speed up the training of deep neural networks by up to 2x while maintaining or improving the best test accuracy of the trained models, when compared to random initialization.

sted, utgiver, år, opplag, sider
Springer Nature, 2024
Emneord
weight initialization, deep neural network training, hyperparameter tuning, model training, deep learning
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-358848 (URN)10.1007/978-981-96-6954-7_5 (DOI)
Konferanse
ICONIP: International Conference on Neural Information Processing, December 2-6, Auckland, New Zeeland
Merknad

QC 20250303

Tilgjengelig fra: 2025-02-28 Laget: 2025-02-28 Sist oppdatert: 2025-07-01bibliografisk kontrollert
de la Rua Martinez, J., Buso, F., Kouzoupis, A., Ormenisan, A. A., Niazi, S., Bzhalava, D., . . . Dowling, J. (2024). The Hopsworks Feature Store for Machine Learning. In: SIGMOD-Companion 2024 - Companion of the 2024 International Conferaence on Management of Data: . Paper presented at 2024 International Conferaence on Management of Data, SIGMOD 2024, Santiago, Chile, Jun 9 2024 - Jun 15 2024 (pp. 135-147). Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>The Hopsworks Feature Store for Machine Learning
Vise andre…
2024 (engelsk)Inngår i: SIGMOD-Companion 2024 - Companion of the 2024 International Conferaence on Management of Data, Association for Computing Machinery (ACM) , 2024, s. 135-147Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Data management is the most challenging aspect of building Machine Learning (ML) systems. ML systems can read large volumes of historical data when training models, but inference workloads are more varied, depending on whether it is a batch or online ML system. The feature store for ML has recently emerged as a single data platform for managing ML data throughout the ML lifecycle, from feature engineering to model training to inference. In this paper, we present the Hopsworks feature store for machine learning as a highly available platform for managing feature data with API support for columnar, row-oriented, and similarity search query workloads. We introduce and address challenges solved by the feature stores related to feature reuse, how to organize data transformations, and how to ensure correct and consistent data between feature engineering, model training, and model inference. We present the engineering challenges in building high-performance query services for a feature store and show how Hopsworks outperforms existing cloud feature stores for training and online inference query workloads.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2024
Serie
Proceedings of the ACM SIGMOD International Conference on Management of Data, ISSN 0730-8078
Emneord
arrow flight, duckdb, feature store, mlops, rondb
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-348769 (URN)10.1145/3626246.3653389 (DOI)2-s2.0-85196429961 (Scopus ID)
Konferanse
2024 International Conferaence on Management of Data, SIGMOD 2024, Santiago, Chile, Jun 9 2024 - Jun 15 2024
Merknad

QC 20240628

Part of ISBN 979-840070422-2

Tilgjengelig fra: 2024-06-27 Laget: 2024-06-27 Sist oppdatert: 2024-06-28bibliografisk kontrollert
Chikafa, G., Sheikholeslami, S., Niazi, S., Dowling, J. & Vlassov, V. (2023). Cloud-native RStudio on Kubernetes for Hopsworks.
Åpne denne publikasjonen i ny fane eller vindu >>Cloud-native RStudio on Kubernetes for Hopsworks
Vise andre…
2023 (engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

In order to fully benefit from cloud computing, services are designed following the “multi-tenant” architectural model, which is aimed at maximizing resource sharing among users. However, multi-tenancy introduces challenges of security, performance isolation, scaling, and customization. RStudio server is an open-source Integrated Development Environment (IDE) accessible over a web browser for the R programming language. We present the design and implementation of a multi-user distributed system on Hopsworks, a data-intensive AI platform, following the multi-tenant model that provides RStudio as Software as a Service (SaaS). We use the most popular cloud-native technologies: Docker and Kubernetes, to solve the problems of performance isolation, security, and scaling that are present in a multi-tenant environment. We further enable secure data sharing in RStudio server instances to provide data privacy and allow collaboration among RStudio users. We integrate our system with Apache Spark, which can scale and handle Big Data processing workloads. Also, we provide a UI where users can provide custom configurations and have full control of their own RStudio server instances. Our system was tested on a Google Cloud Platform cluster with four worker nodes, each with 30GB of RAM allocated to them. The tests on this cluster showed that 44 RStudio servers, each with 2GB of RAM, can be run concurrently. Our system can scale out to potentially support hundreds of concurrently running RStudio servers by adding more resources (CPUs and RAM) to the cluster or system.

Emneord
Multi-tenancy, Cloud-native, Performance Isolation, Security, Scaling, Docker, Kubernetes, SaaS, RStudio, Hopsworks
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-336693 (URN)10.48550/arXiv.2307.09132 (DOI)
Merknad

QC 20230918

Tilgjengelig fra: 2023-09-18 Laget: 2023-09-18 Sist oppdatert: 2023-09-18bibliografisk kontrollert
Sheikholeslami, S., Payberah, A. H., Wang, T., Dowling, J. & Vlassov, V. (2023). The Impact of Importance-Aware Dataset Partitioning on Data-Parallel Training of Deep Neural Networks. In: Distributed Applications and Interoperable Systems - 23rd IFIP WG 6.1 International Conference, DAIS 2023, Held as Part of the 18th International Federated Conference on Distributed Computing Techniques, DisCoTec 2023, Proceedings: . Paper presented at 23rd IFIP International Conference on Distributed Applications and Interoperable Systems, DAIS 2023, Lisbon, Portugal, Jun 19 2023 - Jun 23 2023 (pp. 74-89). Springer Nature
Åpne denne publikasjonen i ny fane eller vindu >>The Impact of Importance-Aware Dataset Partitioning on Data-Parallel Training of Deep Neural Networks
Vise andre…
2023 (engelsk)Inngår i: Distributed Applications and Interoperable Systems - 23rd IFIP WG 6.1 International Conference, DAIS 2023, Held as Part of the 18th International Federated Conference on Distributed Computing Techniques, DisCoTec 2023, Proceedings, Springer Nature , 2023, s. 74-89Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Deep neural networks used for computer vision tasks are typically trained on datasets consisting of thousands of images, called examples. Recent studies have shown that examples in a dataset are not of equal importance for model training and can be categorized based on quantifiable measures reflecting a notion of “hardness” or “importance”. In this work, we conduct an empirical study of the impact of importance-aware partitioning of the dataset examples across workers on the performance of data-parallel training of deep neural networks. Our experiments with CIFAR-10 and CIFAR-100 image datasets show that data-parallel training with importance-aware partitioning can perform better than vanilla data-parallel training, which is oblivious to the importance of examples. More specifically, the proper choice of the importance measure, partitioning heuristic, and the number of intervals for dataset repartitioning can improve the best accuracy of the model trained for a fixed number of epochs. We conclude that the parameters related to importance-aware data-parallel training, including the importance measure, number of warmup training epochs, and others defined in the paper, may be considered as hyperparameters of data-parallel model training.

sted, utgiver, år, opplag, sider
Springer Nature, 2023
Emneord
Data-parallel training, Distributed deep learning, Example importance
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-334525 (URN)10.1007/978-3-031-35260-7_5 (DOI)001288526100005 ()2-s2.0-85164268176 (Scopus ID)
Konferanse
23rd IFIP International Conference on Distributed Applications and Interoperable Systems, DAIS 2023, Lisbon, Portugal, Jun 19 2023 - Jun 23 2023
Merknad

QC 20230823

Tilgjengelig fra: 2023-08-23 Laget: 2023-08-23 Sist oppdatert: 2025-03-04bibliografisk kontrollert
Hagos, D. H., Kakantousis, T., Sheikholeslami, S., Wang, T., Vlassov, V., Payberah, A. H., . . . Dowling, J. (2022). Scalable Artificial Intelligence for Earth Observation Data Using Hopsworks. Remote Sensing, 14(8), Article ID 1889.
Åpne denne publikasjonen i ny fane eller vindu >>Scalable Artificial Intelligence for Earth Observation Data Using Hopsworks
Vise andre…
2022 (engelsk)Inngår i: Remote Sensing, E-ISSN 2072-4292, Vol. 14, nr 8, artikkel-id 1889Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

This paper introduces the Hopsworks platform to the entire Earth Observation (EO) data community and the Copernicus programme. Hopsworks is a scalable data-intensive open-source Artificial Intelligence (AI) platform that was jointly developed by Logical Clocks and the KTH Royal Institute of Technology for building end-to-end Machine Learning (ML)/Deep Learning (DL) pipelines for EO data. It provides the full stack of services needed to manage the entire life cycle of data in ML. In particular, Hopsworks supports the development of horizontally scalable DL applications in notebooks and the operation of workflows to support those applications, including parallel data processing, model training, and model deployment at scale. To the best of our knowledge, this is the first work that demonstrates the services and features of the Hopsworks platform, which provide users with the means to build scalable end-to-end ML/DL pipelines for EO data, as well as support for the discovery and search for EO metadata. This paper serves as a demonstration and walkthrough of the stages of building a production-level model that includes data ingestion, data preparation, feature extraction, model training, model serving, and monitoring. To this end, we provide a practical example that demonstrates the aforementioned stages with real-world EO data and includes source code that implements the functionality of the platform. We also perform an experimental evaluation of two frameworks built on top of Hopsworks, namely Maggy and AutoAblation. We show that using Maggy for hyperparameter tuning results in roughly half the wall-clock time required to execute the same number of hyperparameter tuning trials using Spark while providing linear scalability as more workers are added. Furthermore, we demonstrate how AutoAblation facilitates the definition of ablation studies and enables the asynchronous parallel execution of ablation trials.

sted, utgiver, år, opplag, sider
MDPI AG, 2022
Emneord
Hopsworks, Copernicus, Earth Observation, machine learning, deep learning, artificial intelligence, model serving, big data, ablation studies, Maggy, ExtremeEarth
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-311886 (URN)10.3390/rs14081889 (DOI)000787403900001 ()2-s2.0-85129027995 (Scopus ID)
Merknad

QC 20220506

Tilgjengelig fra: 2022-05-06 Laget: 2022-05-06 Sist oppdatert: 2023-08-28bibliografisk kontrollert
Armgarth, A., Pantzare, S., Arven, P., Lassnig, R., Jinno, H., Gabrielsson, E. O., . . . Berggren, M. (2021). A digital nervous system aiming toward personalized IoT healthcare. Scientific Reports, 11(1), Article ID 7757.
Åpne denne publikasjonen i ny fane eller vindu >>A digital nervous system aiming toward personalized IoT healthcare
Vise andre…
2021 (engelsk)Inngår i: Scientific Reports, E-ISSN 2045-2322, Vol. 11, nr 1, artikkel-id 7757Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Body area networks (BANs), cloud computing, and machine learning are platforms that can potentially enable advanced healthcare outside the hospital. By applying distributed sensors and drug delivery devices on/in our body and connecting to such communication and decision-making technology, a system for remote diagnostics and therapy is achieved with additional autoregulation capabilities. Challenges with such autarchic on-body healthcare schemes relate to integrity and safety, and interfacing and transduction of electronic signals into biochemical signals, and vice versa. Here, we report a BAN, comprising flexible on-body organic bioelectronic sensors and actuators utilizing two parallel pathways for communication and decision-making. Data, recorded from strain sensors detecting body motion, are both securely transferred to the cloud for machine learning and improved decision-making, and sent through the body using a secure body-coupled communication protocol to auto-actuate delivery of neurotransmitters, all within seconds. We conclude that both highly stable and accurate sensing-from multiple sensors-are needed to enable robust decision making and limit the frequency of retraining. The holistic platform resembles the self-regulatory properties of the nervous system, i.e., the ability to sense, communicate, decide, and react accordingly, thus operating as a digital nervous system.

sted, utgiver, år, opplag, sider
NATURE RESEARCH, 2021
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-296428 (URN)10.1038/s41598-021-87177-z (DOI)000639562100077 ()33833303 (PubMedID)2-s2.0-85104084403 (Scopus ID)
Merknad

QC 20210614

Tilgjengelig fra: 2021-06-14 Laget: 2021-06-14 Sist oppdatert: 2022-09-15bibliografisk kontrollert
Sheikholeslami, S., Meister, M., Wang, T., Payberah, A. H., Vlassov, V. & Dowling, J. (2021). AutoAblation: Automated Parallel Ablation Studies for Deep Learning. In: EuroMLSys '21: Proceedings of the 1st Workshop on Machine Learning and Systems: . Paper presented at The 1st Workshop on Machine Learning and Systems (EuroMLSys '21) (pp. 55-61). Association for Computing Machinery
Åpne denne publikasjonen i ny fane eller vindu >>AutoAblation: Automated Parallel Ablation Studies for Deep Learning
Vise andre…
2021 (engelsk)Inngår i: EuroMLSys '21: Proceedings of the 1st Workshop on Machine Learning and Systems, Association for Computing Machinery , 2021, s. 55-61Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Ablation studies provide insights into the relative contribution of different architectural and regularization components to machine learning models' performance. In this paper, we introduce AutoAblation, a new framework for the design and parallel execution of ablation experiments. AutoAblation provides a declarative approach to defining ablation experiments on model architectures and training datasets, and enables the parallel execution of ablation trials. This reduces the execution time and allows more comprehensive experiments by exploiting larger amounts of computational resources. We show that AutoAblation can provide near-linear scalability by performing an ablation study on the modules of the Inception-v3 network trained on the TenGeoPSAR dataset.  

sted, utgiver, år, opplag, sider
Association for Computing Machinery, 2021
Emneord
Ablation Studies, Deep Learning, Feature Ablation, Model Ablation, Parallel Trial Execution
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-294424 (URN)10.1145/3437984.3458834 (DOI)000927844400008 ()2-s2.0-85106034900 (Scopus ID)
Konferanse
The 1st Workshop on Machine Learning and Systems (EuroMLSys '21)
Forskningsfinansiär
EU, Horizon 2020
Merknad

QC 20210527

Tilgjengelig fra: 2021-05-17 Laget: 2021-05-17 Sist oppdatert: 2025-03-04bibliografisk kontrollert
Ismail, M., Niazi, S., Sundell, M., Ronstrom, M., Haridi, S. & Dowling, J. (2020). Distributed Hierarchical File Systems strike back in the Cloud. In: 2020 IEEE 40th international conference on distributed computing systems (ICDCS): . Paper presented at 40th IEEE International Conference on Distributed Computing Systems (ICDCS), NOV 29-DEC 01, 2020, ELECTR NETWORK (pp. 820-830). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>Distributed Hierarchical File Systems strike back in the Cloud
Vise andre…
2020 (engelsk)Inngår i: 2020 IEEE 40th international conference on distributed computing systems (ICDCS), Institute of Electrical and Electronics Engineers (IEEE) , 2020, s. 820-830Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Cloud service providers have aligned on availability zones as an important unit of failure and replication for storage systems. An availability zone (AZ) has independent power, networking, and cooling systems and consists of one or more data centers. Multiple AZs in close geographic proximity form a region that can support replicated low latency storage services that can survive the failure of one or more AZs. Recent reductions in inter-AZ latency have made synchronous replication protocols increasingly viable, instead of traditional quorum-based replication protocols. We introduce HopsFS-CL, a distributed hierarchical file system with support for high-availability (HA) across AZs, backed by AZ-aware synchronously replicated metadata and AZ-aware block replication. HopsFS-CL is a redesign of HopsFS, a version of HDFS with distributed metadata, and its design involved making replication protocols and block placement protocols AZ-aware at all layers of its stack: the metadata serving, the metadata storage, and block storage layers. In experiments on a real-world workload from Spotify, we show that HopsFS-CL, deployed in HA mode over 3 AZs, reaches 1.66 million ops/s, and has similar performance to HopsFS when deployed in a single AZ, while preserving the same semantics.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2020
Serie
IEEE International Conference on Distributed Computing Systems, ISSN 1063-6927
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-299114 (URN)10.1109/ICDCS47774.2020.00108 (DOI)000667971400075 ()2-s2.0-85101968318 (Scopus ID)
Konferanse
40th IEEE International Conference on Distributed Computing Systems (ICDCS), NOV 29-DEC 01, 2020, ELECTR NETWORK
Merknad

QC 20210803

Not duplicate with DiVA 1467134

Tilgjengelig fra: 2021-08-03 Laget: 2021-08-03 Sist oppdatert: 2022-06-25bibliografisk kontrollert
Ismail, M., Niazi, S., Sundell, M., Ronström, M., Haridi, S. & Dowling, J. (2020). Distributed Hierarchical File Systems strike back in the Cloud. In: : . Paper presented at 40th IEEE International Conference on Distributed Computing Systems, November 29 - December 1, 2020, Singapore.
Åpne denne publikasjonen i ny fane eller vindu >>Distributed Hierarchical File Systems strike back in the Cloud
Vise andre…
2020 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Cloud service providers have aligned on availability zones as an important unit of failure and replication for storage systems. An availability zone (AZ) has independent power, networking, and cooling systems and consists of one or more data centers. Multiple AZs in close geographic proximity form a region that can support replicated low latency storage services that can survive the failure of one or more AZs. Recent reductions in inter-AZ latency have made synchronous replication protocols increasingly viable, instead of traditional quorum-based replication protocols. We introduce HopsFS-CL, a distributed hierarchical file system with support for high- availability (HA) across AZs, backed by AZ-aware synchronously replicated metadata and AZ-aware block replication. HopsFS-CL is a redesign of HopsFS, a version of HDFS with distributed metadata, and its design involved making replication protocols and block placement protocols AZ-aware at all layers of its stack: the metadata serving, the metadata storage, and block storage layers. In experiments on a real-world workload from Spotify, we show that HopsFS-CL, deployed in HA mode over 3 AZs, reaches 1.66 million ops/s, and has similar performance to HopsFS when deployed in a single AZ, while preserving the same semantics.

HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-280786 (URN)
Konferanse
40th IEEE International Conference on Distributed Computing Systems, November 29 - December 1, 2020, Singapore
Merknad

QC 20210120

Tilgjengelig fra: 2020-09-14 Laget: 2020-09-14 Sist oppdatert: 2022-06-25bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-9484-6714