kth.sePublications
7891011121310 of 20
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Representation Learning and Parallelization for Machine Learning Applications with Graph, Tabular, and Time-Series Data
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0003-0422-6560
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Machine Learning (ML) models have achieved significant success in representation learning across domains like vision, language, graphs, and tabular data. Constructing effective ML models hinges on several critical considerations: (1) data representation: how to represent the input data in a meaningful and effective way; (2) learning objectives: how to define desired prediction target in a specific downstream task; (3) model architecture: which representation learning model architecture, i.e., the type of neural network, is the most appropriate for the given downstream task; (4) training strategy: how to effectively train the selected ML model for better feature extraction and representation quality.

This thesis explores representation learning and parallelization in machine learning, addressing how to boost model accuracy and reduce training time. Our research explores several innovative approaches to improve the efficiency and effectiveness of ML applications on graph, tabular, and time-series data, with contributions to areas such as combinatorial optimization, parallel training, and ML methods across these data types. First, we explore representation learning in combinatorial optimization and integrate a constraint-based exact solver with the predictive ML model to enhance problem-solving efficiency. We demonstrate that combining an exact solver with a predictive model that estimates optimal solution costs significantly reduces the search space and accelerates solution times. Second, we employ graph Transformer models to leverage topological and semantic node similarities in the input data, resulting in superior node representations and improved downstream task performance. Third, we empirically study the choice of model architecture for learning from tabular data. We showcase the application of tabular Transformer models to large datasets, revealing their ability to create high predictive power features. Fourth, we utilize Transformer models for detailed user behavior modeling from time-series data, illustrating their effectiveness in capturing fine-grained patterns. Finally, we dive into the training strategy and investigate graph traversal strategies to improve device placement in deep learning model parallelization, showing that optimized traversal order enhances parallel training speed. Collectively, these findings advance the understanding and application of representation learning and parallelization in diverse ML contexts.

This thesis enhances representation learning and parallelization in ML models, addressing key challenges in representation quality. Our methods advance combinatorial optimization, parallel training, and ML on graph, tabular, and time-series data. Additionally, our findings contribute to understanding Transformer models, leading to more accurate predictions and improved performance across various domains.

Abstract [sv]

Maskininlärningsmodeller (ML) har nått betydande framgångar i representationsinlärning över domäner som datorseende, språk, grafer och tabelldata. Att konstruera effektiva ML-modeller beror på flera kritiska överväganden. (1) datarepresentation: Hur indata representeras på ett meningsfullt och effektivt; (2) lärandemål: hur önskat förutsägelsemål defineras i en specifik nedströmsuppgift; (3) modellarkitektur: vilken modellarkitektur för representationsinlärning, d.v.s. typen av neurala nätverk, är den mest lämpliga för den givna nedströmsuppgiften; (4) träningsstrategi: hur man effektivt tränar den valda ML-modellen för bättre funktionsextraktion och representationskvalitet.

Denna avhandling utforskar representationsinlärning och parallellisering i maskininlärning, och tar upp hur man kan öka modellnoggrannheten och minska träningstiden. Vår forskning undersöker flera innovativa tillvägagångssätt för att förbättra effektiviteten hos ML på graf-, tabell- och tidsseriedata. Först utforskar vi representationsinlärning i kombinatorisk optimering och integrerar en villkorsprogrammeringsbaserad exakt lösare med den prediktiva ML-modellen för att förbättra problemlösningseffektiviteten. Vi visar att kombinationen av en exakt lösare med en prediktiv modell som uppskattar optimala lösningskostnader minskar sökutrymmet och lösningstiderna avsevärt. För det andra använder vi Transformer-modeller för grafer för att utnyttja topologiska och semantiska nodlikheter i indata, vilket resulterar i överlägsna nodrepresentationer och förbättrad prestanda för nedströms uppgifter. För det tredje studerar vi empiriskt valet av modellarkitektur för att lära av tabelldata. Vi visar upp tillämpningen av tabellformade transformatormodeller på stora datamängder, och avslöjar deras förmåga att skapa funktioner med hög prediktiv kraft. För det fjärde använder vi transformatormodeller för detaljerad modellering av användarbeteende från tidsseriedata, som illustrerar deras effektivitet när det gäller att fånga finkorniga mönster. Slutligen fördjupar vi oss i träningsstrategin och undersöker grafövergångsstrategier för att förbättra enhetsplacering i parallellisering av djupinlärningsmodeller, vilket visar att optimerad genomgångsordning förbättrar parallell träningshastighet. Tillsammans främjar dessa fynd förståelsen och tillämpningen av representationsinlärning och parallellisering i olika MLsammanhang.

Denna avhandling förbättrar representationsinlärning och parallellisering i ML-modeller, och tar upp viktiga utmaningar i representationskvalitet. Våra metoder främjar kombinatorisk optimering, parallell träning och ML på graf-, tabell- och tidsseriedata. Dessutom bidrar våra resultat till att förstå transformatormodeller, vilket leder till mer exakta förutsägelser och förbättrad prestanda över olika domäner.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. , p. 55
Series
TRITA-EECS-AVL ; 2024:72
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-353825ISBN: 978-91-8106-051-5 (print)OAI: oai:DiVA.org:kth-353825DiVA, id: diva2:1900636
Public defence
2024-10-21, Sal C, Electrum, Kistagången 16, https://kth-se.zoom.us/s/63322131109, Stockholm, 09:00 (English)
Opponent
Supervisors
Available from: 2024-09-24 Created: 2024-09-24 Last updated: 2024-09-24Bibliographically approved
List of papers
1. CONVJSSP: Convolutional Learning for Job-Shop Scheduling Problems
Open this publication in new window or tab >>CONVJSSP: Convolutional Learning for Job-Shop Scheduling Problems
2020 (English)In: 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), 2020, p. 1483-1490Conference paper, Published paper (Refereed)
Abstract [en]

The Job-Shop Scheduling Problem (JSSP) is a well-known optimization problem with plenty of existing solutions. Although remarkable progress has been made in addressing the problem, most of the solutions require input from human experts. Deep Learning techniques, on the other hand, have proven successful in acquiring knowledge from data without using step-by-step instructions from humans. In this work, we propose a novel solution, called ConvJSSP, by applying Deep Learning to speed up the solving process of JSSPs and to reduce the need for human involvement. In ConvJSSP, we train a Convolutional Neural Network model for predicting the optimal makespan of JSSPs, and use the predicted makespan to accelerate the JSSP solving schema. Through the experiments, we compare several JSSP solving methods based on ConvJSSP approach with a state-of-the-art solution as a baseline, and show that ConvJSSP speeds up the problem solving up to 9% compared to the baseline method.

Keywords
Deep Learning, Convolutional Neural Networks, Constraint Programming, Job-Shop Scheduling Problem
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-294562 (URN)10.1109/ICMLA51294.2020.00229 (DOI)2-s2.0-85102527991 (Scopus ID)
Conference
2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA)
Note

QC 20210609

Available from: 2021-05-17 Created: 2021-05-17 Last updated: 2024-09-24Bibliographically approved
2. Accelerate Model Parallel Deep Learning Training Using Effective Graph Traversal Order in Device Placement
Open this publication in new window or tab >>Accelerate Model Parallel Deep Learning Training Using Effective Graph Traversal Order in Device Placement
2022 (English)In: Distributed Applications and Interoperable Systems  (DAIS 2022) / [ed] Eyers, D Voulgaris, S, Springer Nature , 2022, Vol. 13272, p. 114-130Conference paper, Published paper (Refereed)
Abstract [en]

Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However, different device placements of the same neural network lead to different training times. Most of the existing device placement solutions treat the problem as sequential decisionmaking by traversing neural network graphs and assigning their neurons to different devices. This work studies the impact of neural network graph traversal orders on device placement. In particular, we empirically study how different graph traversal orders of neural networks lead to different device placements, which in turn affects the training time of the neural network. Our experiment results show that the best graph traversal order depends on the type of neural networks and their computation graphs features. In this work, we also provide recommendations on choosing effective graph traversal orders in device placement for various neural network families to improve the training time in model parallelization.

Place, publisher, year, edition, pages
Springer Nature, 2022
Series
Lecture Notes in Computer Science, ISSN 0302-9743
Keywords
Device Placement, Model Parallelization, Deep Learning, Graph Traversal Order
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-320676 (URN)10.1007/978-3-031-16092-9_8 (DOI)000866213900008 ()2-s2.0-85137997061 (Scopus ID)
Conference
22nd IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems (DAIS) Held as Part of the 17th International Federated Conference on Distributed Computing Techniques (DisCoTec), JUN 13-17, 2022, Lucca, ITALY
Note

Part of proceedings: ISBN 978-3-031-16092-9, ISBN 978-3-031-16091-2

QC 20221031

Available from: 2022-10-31 Created: 2022-10-31 Last updated: 2024-09-24Bibliographically approved
3. Node Context Selection in Transformer-Based Graph Representation Learning Models
Open this publication in new window or tab >>Node Context Selection in Transformer-Based Graph Representation Learning Models
2022 (English)In: Proceedings: 2022 IEEE International Conference on Big Data, Big Data 2022, Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 4625-4634Conference paper, Published paper (Refereed)
Abstract [en]

Transformer models have great potential in Graph Representation Learning (GRL) for efficiently scaling the learning process on large datasets and solving many challenges presented in Graph Neural Networks, e.g., oversmoothing and suspended animation. To represent each node of a graph, Transformer models as input usually take a node together with the node context, i.e., a set of other nodes that serve as learning context for the target node. However, current GRL Transformer models mainly consider the graph topology when selecting the node context for each target node. In this work, we demonstrate the important role of node features in selecting the node context. Specifically, we propose a hybrid approach for selecting node context that considers both the graph topology and the semantic similarities between node features. Through the empirical evaluations, we show the advantages of our hybrid node context selection method for a downstream classification task on various datasets compared to selection methods that only consider graph topology or semantic similarities. The best classification accuracy improvements of our proposed hybrid methods over the baseline methods on each dataset range from 0.77% to 6.05%.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
Graph Representation Learning, Graph-Bert, Node Context Selection, Transformer Models
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-333405 (URN)10.1109/BigData55660.2022.10020988 (DOI)2-s2.0-85147902777 (Scopus ID)
Conference
2022 IEEE International Conference on Big Data, Big Data 2022, Osaka, Japan, Dec 17 2022 - Dec 20 2022
Note

Part of ISBN 9781665480451

QC 20230801

Available from: 2023-08-01 Created: 2023-08-01 Last updated: 2024-09-24Bibliographically approved
4. Graph Representation Learning with Graph Transformers in Neural Combinatorial Optimization
Open this publication in new window or tab >>Graph Representation Learning with Graph Transformers in Neural Combinatorial Optimization
2023 (English)In: 2023 International Conference on Machine Learning and Applications (ICMLA), IEEE Computer Society, 2023, p. 488-495Conference paper, Published paper (Refereed)
Abstract [en]

Neural combinatorial optimization aims to use neural networks to speed up the solving process of combinatorial optimization problems, i.e., finding the optimal solution of a problem instance from a finite set of feasible solutions that minimize a given objective function. Recently, researchers have applied convolutional neural networks to predict the optimal solution's cost (defined by the objective function) to give as extra input to an exact solver to speed up the solving process. In this paper, we investigate whether graph representations that explicitly model the inherent constraints in combinatorial optimization problems would improve the performance of predicting the optimal solution's cost. Specifically, we use graph neural networks with neighborhood aggregation and graph Transformer models to capture and embed the knowledge in the graph representations of combinatorial optimization problems. We also propose a benchmark dataset containing the Traveling Salesman Problem (TSP) and Job-Shop Scheduling Problem (JSSP), and through the empirical evaluation, we show that graph Transformer models achieve an average loss decrease of 61.05% on TSP and 66.53% on JSSP compared to the baseline convolutional neural networks.

Place, publisher, year, edition, pages
IEEE Computer Society, 2023
Keywords
Representation learning;Knowledge engineering;Costs;Traveling salesman problems;Predictive models;Transformers;Linear programming;Graph Representation Learning;Combinatorial Optimization;Graph Transformer;Job-Shop Scheduling Problem;Traveling Salesman Problem
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-345593 (URN)10.1109/ICMLA58977.2023.00074 (DOI)2-s2.0-85190105518 (Scopus ID)
Conference
2023 International Conference on Machine Learning and Applications (ICMLA), Date of Conference: 15-17 December, Conference Location: Jacksonville, FL, USA
Note

QC 20240415

Part of ISBN: 79-8-3503-4534-6

Partof ISBN:979-8-3503-1891-3

Available from: 2024-04-12 Created: 2024-04-12 Last updated: 2024-09-24Bibliographically approved
5. Mind the Data, Measuring the Performance Gap Between Tree Ensembles and Deep Learning on Tabular Data
Open this publication in new window or tab >>Mind the Data, Measuring the Performance Gap Between Tree Ensembles and Deep Learning on Tabular Data
Show others...
2024 (English)In: Advances in Intelligent Data Analysis XXII - 22nd International Symposium on Intelligent Data Analysis, IDA 2024, Proceedings, Springer Nature , 2024, Vol. 14641, p. 65-76Conference paper, Published paper (Refereed)
Abstract [en]

Recent machine learning studies on tabular data show that ensembles of decision tree models are more efficient and performant than deep learning models such as Tabular Transformer models. However, as we demonstrate, these studies are limited in scope and do not paint the full picture. In this work, we focus on how two dataset properties, namely dataset size and feature complexity, affect the empirical performance comparison between tree ensembles and Tabular Transformer models. Specifically, we employ a hypothesis-driven approach and identify situations where Tabular Transformer models are expected to outperform tree ensemble models. Through empirical evaluation, we demonstrate that given large enough datasets, deep learning models perform better than tree models. This gets more pronounced when complex feature interactions exist in the given task and dataset, suggesting that one must pay careful attention to dataset properties when selecting a model for tabular data in machine learning – especially in an industrial setting, where larger and larger datasets with less and less carefully engineered features are becoming routinely available.

Place, publisher, year, edition, pages
Springer Nature, 2024
Series
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISSN 0302-9743 ; 14641
Keywords
Gradient boosting, Tabular data, Tabular Transformers
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-346536 (URN)10.1007/978-3-031-58547-0_6 (DOI)001295919100006 ()2-s2.0-85192227414 (Scopus ID)
Conference
22nd International Symposium on Intelligent Data Analysis, IDA 2024, Stockholm, Sweden, Apr 24 2024 - Apr 26 2024
Note

QC 20240521

Part of ISBN 978-303158546-3

Available from: 2024-05-16 Created: 2024-05-16 Last updated: 2024-10-03Bibliographically approved
6. player2vec: A Language Modeling Approach to Understand Player Behavior in Games
Open this publication in new window or tab >>player2vec: A Language Modeling Approach to Understand Player Behavior in Games
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Methods for learning latent user representations from historical behavior logs have gained traction for recommendation tasks in e-commerce, content streaming, and other settings. However, this area still remains relatively underexplored in video and mobile gaming contexts. In this work, we present a novel method for overcoming this limitation by extending a long-range Transformer model from the natural language processing domain to player behavior data. We discuss specifics of behavior tracking in games and propose preprocessing and tokenization approaches by viewing in-game events in an analogous way to words in sentences, thus enabling learning player representations in a self-supervised manner in the absence of ground-truth annotations. We experimentally demonstrate the efficacy of the proposed approach in fitting the distribution of behavior events by evaluating intrinsic language modeling metrics. Furthermore, we qualitatively analyze the emerging structure of the learned embedding space and show its value for generating insights into behavior patterns to inform downstream applications.

National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-353823 (URN)
Note

QC 20240925

Available from: 2024-09-24 Created: 2024-09-24 Last updated: 2024-09-25Bibliographically approved

Open Access in DiVA

summary(1568 kB)0 downloads
File information
File name SUMMARY01.pdfFile size 1568 kBChecksum SHA-512
c3995587099afd533a753057b6e81ae00e3987716fa082b70393711d2d362f83d488683c982cd95804594f6c68c5792f4f6525db72c137e44c61c6b7f60dbc90
Type summaryMimetype application/pdf

Search in DiVA

By author/editor
Wang, Tianze
By organisation
Software and Computer systems, SCS
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
7891011121310 of 20
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf