kth.sePublications
Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Accelerate Model Parallel Deep Learning Training Using Effective Graph Traversal Order in Device Placement
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0002-2748-8929
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0002-6779-7435
2022 (English)In: Distributed Applications and Interoperable Systems  (DAIS 2022) / [ed] Eyers, D Voulgaris, S, Springer Nature , 2022, Vol. 13272, p. 114-130Conference paper, Published paper (Refereed)
Abstract [en]

Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However, different device placements of the same neural network lead to different training times. Most of the existing device placement solutions treat the problem as sequential decisionmaking by traversing neural network graphs and assigning their neurons to different devices. This work studies the impact of neural network graph traversal orders on device placement. In particular, we empirically study how different graph traversal orders of neural networks lead to different device placements, which in turn affects the training time of the neural network. Our experiment results show that the best graph traversal order depends on the type of neural networks and their computation graphs features. In this work, we also provide recommendations on choosing effective graph traversal orders in device placement for various neural network families to improve the training time in model parallelization.

Place, publisher, year, edition, pages
Springer Nature , 2022. Vol. 13272, p. 114-130
Series
Lecture Notes in Computer Science, ISSN 0302-9743
Keywords [en]
Device Placement, Model Parallelization, Deep Learning, Graph Traversal Order
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-320676DOI: 10.1007/978-3-031-16092-9_8ISI: 000866213900008Scopus ID: 2-s2.0-85137997061OAI: oai:DiVA.org:kth-320676DiVA, id: diva2:1707275
Conference
22nd IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems (DAIS) Held as Part of the 17th International Federated Conference on Distributed Computing Techniques (DisCoTec), JUN 13-17, 2022, Lucca, ITALY
Note

Part of proceedings: ISBN 978-3-031-16092-9, ISBN 978-3-031-16091-2

QC 20221031

Available from: 2022-10-31 Created: 2022-10-31 Last updated: 2024-09-24Bibliographically approved
In thesis
1. Representation Learning and Parallelization for Machine Learning Applications with Graph, Tabular, and Time-Series Data
Open this publication in new window or tab >>Representation Learning and Parallelization for Machine Learning Applications with Graph, Tabular, and Time-Series Data
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Machine Learning (ML) models have achieved significant success in representation learning across domains like vision, language, graphs, and tabular data. Constructing effective ML models hinges on several critical considerations: (1) data representation: how to represent the input data in a meaningful and effective way; (2) learning objectives: how to define desired prediction target in a specific downstream task; (3) model architecture: which representation learning model architecture, i.e., the type of neural network, is the most appropriate for the given downstream task; (4) training strategy: how to effectively train the selected ML model for better feature extraction and representation quality.

This thesis explores representation learning and parallelization in machine learning, addressing how to boost model accuracy and reduce training time. Our research explores several innovative approaches to improve the efficiency and effectiveness of ML applications on graph, tabular, and time-series data, with contributions to areas such as combinatorial optimization, parallel training, and ML methods across these data types. First, we explore representation learning in combinatorial optimization and integrate a constraint-based exact solver with the predictive ML model to enhance problem-solving efficiency. We demonstrate that combining an exact solver with a predictive model that estimates optimal solution costs significantly reduces the search space and accelerates solution times. Second, we employ graph Transformer models to leverage topological and semantic node similarities in the input data, resulting in superior node representations and improved downstream task performance. Third, we empirically study the choice of model architecture for learning from tabular data. We showcase the application of tabular Transformer models to large datasets, revealing their ability to create high predictive power features. Fourth, we utilize Transformer models for detailed user behavior modeling from time-series data, illustrating their effectiveness in capturing fine-grained patterns. Finally, we dive into the training strategy and investigate graph traversal strategies to improve device placement in deep learning model parallelization, showing that optimized traversal order enhances parallel training speed. Collectively, these findings advance the understanding and application of representation learning and parallelization in diverse ML contexts.

This thesis enhances representation learning and parallelization in ML models, addressing key challenges in representation quality. Our methods advance combinatorial optimization, parallel training, and ML on graph, tabular, and time-series data. Additionally, our findings contribute to understanding Transformer models, leading to more accurate predictions and improved performance across various domains.

Abstract [sv]

Maskininlärningsmodeller (ML) har nått betydande framgångar i representationsinlärning över domäner som datorseende, språk, grafer och tabelldata. Att konstruera effektiva ML-modeller beror på flera kritiska överväganden. (1) datarepresentation: Hur indata representeras på ett meningsfullt och effektivt; (2) lärandemål: hur önskat förutsägelsemål defineras i en specifik nedströmsuppgift; (3) modellarkitektur: vilken modellarkitektur för representationsinlärning, d.v.s. typen av neurala nätverk, är den mest lämpliga för den givna nedströmsuppgiften; (4) träningsstrategi: hur man effektivt tränar den valda ML-modellen för bättre funktionsextraktion och representationskvalitet.

Denna avhandling utforskar representationsinlärning och parallellisering i maskininlärning, och tar upp hur man kan öka modellnoggrannheten och minska träningstiden. Vår forskning undersöker flera innovativa tillvägagångssätt för att förbättra effektiviteten hos ML på graf-, tabell- och tidsseriedata. Först utforskar vi representationsinlärning i kombinatorisk optimering och integrerar en villkorsprogrammeringsbaserad exakt lösare med den prediktiva ML-modellen för att förbättra problemlösningseffektiviteten. Vi visar att kombinationen av en exakt lösare med en prediktiv modell som uppskattar optimala lösningskostnader minskar sökutrymmet och lösningstiderna avsevärt. För det andra använder vi Transformer-modeller för grafer för att utnyttja topologiska och semantiska nodlikheter i indata, vilket resulterar i överlägsna nodrepresentationer och förbättrad prestanda för nedströms uppgifter. För det tredje studerar vi empiriskt valet av modellarkitektur för att lära av tabelldata. Vi visar upp tillämpningen av tabellformade transformatormodeller på stora datamängder, och avslöjar deras förmåga att skapa funktioner med hög prediktiv kraft. För det fjärde använder vi transformatormodeller för detaljerad modellering av användarbeteende från tidsseriedata, som illustrerar deras effektivitet när det gäller att fånga finkorniga mönster. Slutligen fördjupar vi oss i träningsstrategin och undersöker grafövergångsstrategier för att förbättra enhetsplacering i parallellisering av djupinlärningsmodeller, vilket visar att optimerad genomgångsordning förbättrar parallell träningshastighet. Tillsammans främjar dessa fynd förståelsen och tillämpningen av representationsinlärning och parallellisering i olika MLsammanhang.

Denna avhandling förbättrar representationsinlärning och parallellisering i ML-modeller, och tar upp viktiga utmaningar i representationskvalitet. Våra metoder främjar kombinatorisk optimering, parallell träning och ML på graf-, tabell- och tidsseriedata. Dessutom bidrar våra resultat till att förstå transformatormodeller, vilket leder till mer exakta förutsägelser och förbättrad prestanda över olika domäner.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. p. 55
Series
TRITA-EECS-AVL ; 2024:72
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-353825 (URN)978-91-8106-051-5 (ISBN)
Public defence
2024-10-21, Sal C, Electrum, Kistagången 16, https://kth-se.zoom.us/s/63322131109, Stockholm, 09:00 (English)
Opponent
Supervisors
Available from: 2024-09-24 Created: 2024-09-24 Last updated: 2024-09-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Wang, TianzePayberah, Amir H.Hagos, Desta HaileselassieVlassov, Vladimir

Search in DiVA

By author/editor
Wang, TianzePayberah, Amir H.Hagos, Desta HaileselassieVlassov, Vladimir
By organisation
Software and Computer systems, SCS
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 51 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf