kth.sePublications
Change search
Link to record
Permanent link

Direct link
Ceylan, Ciwan
Publications (5 of 5) Show all publications
Ceylan, C., Ghoorchian, K. & Kragic, D. (2024). Scalable Unsupervised Feature Selection with Reconstruction Error Guarantees via QMR Decomposition. In: CIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management: . Paper presented at 33rd ACM International Conference on Information and Knowledge Management, CIKM 2024, Boise, United States of America, October 21-25, 2024 (pp. 3658-3662). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Scalable Unsupervised Feature Selection with Reconstruction Error Guarantees via QMR Decomposition
2024 (English)In: CIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Association for Computing Machinery (ACM) , 2024, p. 3658-3662Conference paper, Published paper (Refereed)
Abstract [en]

Unsupervised feature selection (UFS) methods have garnered significant attention for their capability to eliminate redundant features without relying on class label information. However, their scalability to large datasets remains a challenge, rendering common UFS methods impractical for such applications. To address this issue, we introduce QMR-FS, a greedy forward filtering approach that selects linearly independent features up to a specified relative tolerance, ensuring that any excluded features can be reconstructed from the retained set within this tolerance. This is achieved through the QMR matrix decomposition, which builds upon the well-known QR decomposition. QMR-FS benefits from linear complexity relative to the number of instances and boasts exceptional performance due to its ability to leverage parallelized computation on both CPU and GPU. Despite its greedy nature, QMR-FS achieves comparable classification and clustering accuracies across multiple datasets when compared to other UFS methods, while achieving runtimes approximately 10 times faster than recently proposed scalable UFS methods for datasets ranging from 100 million to 1 billion elements.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
feature selection, linear independence, scalability, unsupervised learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-357143 (URN)10.1145/3627673.3679994 (DOI)2-s2.0-85210013171 (Scopus ID)
Conference
33rd ACM International Conference on Information and Knowledge Management, CIKM 2024, Boise, United States of America, October 21-25, 2024
Note

Part of ISBN 9798400704369

QC 20241205

Available from: 2024-12-04 Created: 2024-12-04 Last updated: 2024-12-05Bibliographically approved
Ceylan, C., Franzen, S. & Pokorny, F. T. (2021). Learning Node Representations Using Stationary Flow Prediction on Large Payment and Cash Transaction Networks. In: Meila, M Zhang, T (Ed.), International Conference On Machine Learning, Vol 139: . Paper presented at International Conference on Machine Learning (ICML), JUL 18-24, 2021, ELECTR NETWORK. JMLR-JOURNAL MACHINE LEARNING RESEARCH, 139
Open this publication in new window or tab >>Learning Node Representations Using Stationary Flow Prediction on Large Payment and Cash Transaction Networks
2021 (English)In: International Conference On Machine Learning, Vol 139 / [ed] Meila, M Zhang, T, JMLR-JOURNAL MACHINE LEARNING RESEARCH , 2021, Vol. 139Conference paper, Published paper (Refereed)
Abstract [en]

Banks are required to analyse large transaction datasets as a part of the fight against financial crime. Today, this analysis is either performed manually by domain experts or using expensive feature engineering. Gradient flow analysis allows for basic representation learning as node potentials can be inferred directly from network transaction data. However, the gradient model has a fundamental limitation: it cannot represent all types of of network flows. Furthermore, standard methods for learning the gradient flow are not appropriate for flow signals that span multiple orders of magnitude and contain outliers, i.e. transaction data. In this work, the gradient model is extended to a gated version and we prove that it, unlike the gradient model, is a universal approximator for flows on graphs. To tackle the mentioned challenges of transaction data, we propose a multi-scale and outlier robust loss function based on the Student-t log-likelihood. Ethereum transaction data is used for evaluation and the gradient models outperform MLP models using hand-engineered and node2vec features in terms of relative error. These results extend to 60 synthetic datasets, with experiments also showing that the gated gradient model learns qualitative information about the underlying synthetic generative flow distributions.

Place, publisher, year, edition, pages
JMLR-JOURNAL MACHINE LEARNING RESEARCH, 2021
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-303379 (URN)000683104601036 ()
Conference
International Conference on Machine Learning (ICML), JUL 18-24, 2021, ELECTR NETWORK
Note

QC 20211015

Available from: 2021-10-15 Created: 2021-10-15 Last updated: 2022-06-25Bibliographically approved
Ceylan, C., Franzén, S. & Pokorny, F. T. (2021). Learning Node Representations Using Stationary Flow Prediction on Large Payment and Cash Transaction Networks. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021: . Paper presented at 38th International Conference on Machine Learning, ICML 2021, Virtual, Online, NA, Jul 18 2021 - Jul 24 2021 (pp. 1395-1406). ML Research Press
Open this publication in new window or tab >>Learning Node Representations Using Stationary Flow Prediction on Large Payment and Cash Transaction Networks
2021 (English)In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, ML Research Press , 2021, p. 1395-1406Conference paper, Published paper (Refereed)
Abstract [en]

Banks are required to analyse large transaction datasets as a part of the fight against financial crime. Today, this analysis is either performed manually by domain experts or using expensive feature engineering. Gradient flow analysis allows for basic representation learning as node potentials can be inferred directly from network transaction data. However, the gradient model has a fundamental limitation: it cannot represent all types of of network flows. Furthermore, standard methods for learning the gradient flow are not appropriate for flow signals that span multiple orders of magnitude and contain outliers, i.e. transaction data. In this work, the gradient model is extended to a gated version and we prove that it, unlike the gradient model, is a universal approximator for flows on graphs. To tackle the mentioned challenges of transaction data, we propose a multi-scale and outlier robust loss function based on the Student-t log-likelihood. Ethereum transaction data is used for evaluation and the gradient models outperform MLP models using hand-engineered and node2vec features in terms of relative error. These results extend to 60 synthetic datasets, with experiments also showing that the gated gradient model learns qualitative information about the underlying synthetic generative flow distributions.

Place, publisher, year, edition, pages
ML Research Press, 2021
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-347968 (URN)2-s2.0-85134915157 (Scopus ID)
Conference
38th International Conference on Machine Learning, ICML 2021, Virtual, Online, NA, Jul 18 2021 - Jul 24 2021
Note

Part of ISBN [9781713845065]

QC 20240703

Available from: 2024-07-03 Created: 2024-07-03 Last updated: 2024-07-03Bibliographically approved
Ceylan, C. & Gutmann, M. U. (2018). Conditional Noise-Contrastive Estimation of Unnormalised Models. In: Dy, J Krause, A (Ed.), 35th International Conference on Machine Learning, ICML 2018: . Paper presented at 35th International Conference on Machine Learning (ICML), JUL 10-15, 2018, Stockholm, Sweden (pp. 1334-1442). International Machine Learning Society (IMLS), 80
Open this publication in new window or tab >>Conditional Noise-Contrastive Estimation of Unnormalised Models
2018 (English)In: 35th International Conference on Machine Learning, ICML 2018 / [ed] Dy, J Krause, A, International Machine Learning Society (IMLS) , 2018, Vol. 80, p. 1334-1442Conference paper, Published paper (Refereed)
Abstract [en]

Many parametric statistical models are not properly normalised and only specified up to an intractable partition function, which renders parameter estimation difficult. Examples of unnormalised models are Gibbs distributions, Markov random fields, and neural network models in unsupervised deep learning. In previous work, the estimation principle called noise-contrastive estimation (NCE) was introduced where unnormalised models are estimated by learning to distinguish between data and auxiliary noise. An open question is how to best choose the auxiliary noise distribution. We here propose a new method that addresses this issue. The proposed method shares with NCE the idea of formulating density estimation as a supervised learning problem but in contrast to NCE, the proposed method leverages the observed data when generating noise samples. The noise can thus be generated in a semi-automated manner. We first present the underlying theory of the new method, show that score matching emerges as a limiting case, validate the method on continuous and discrete valued synthetic data, and show that we can expect an improved performance compared to NCE when the data lie in a lower-dimensional manifold. Then we demonstrate its applicability in unsupervised deep learning by estimating a four-layer neural image model.

Place, publisher, year, edition, pages
International Machine Learning Society (IMLS), 2018
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
National Category
Other Engineering and Technologies
Identifiers
urn:nbn:se:kth:diva-318709 (URN)000683379200075 ()2-s2.0-85057220926 (Scopus ID)
Conference
35th International Conference on Machine Learning (ICML), JUL 10-15, 2018, Stockholm, Sweden
Note

QC 20220922

Part of books: ISBN 978-151086796-3

Available from: 2022-09-22 Created: 2022-09-22 Last updated: 2023-09-22Bibliographically approved
Poklukar, P., Ceylan, C., Hultin, H., Kravchenko, O., Varava, A. & Kragic, D.GraphDCA - a Framework for Node Distribution Comparison in Real and Synthetic Graphs.
Open this publication in new window or tab >>GraphDCA - a Framework for Node Distribution Comparison in Real and Synthetic Graphs
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

We argue that when comparing two graphs, the distribution of node structural features is more informative than global graph statistics which are often used in practice, especially to evaluate graph generative models. Thus, we present GraphDCA - a framework for evaluating similarity between graphs based on the alignment of their respective node representation sets. The sets are compared using a recently proposed method for comparing representation spaces, called Delaunay Component Analysis (DCA), which we extend to graph data. To evaluate our framework, we generate a benchmark dataset of graphs exhibiting different structural patterns and show, using three node structure feature extractors, that GraphDCA recognizes graphs with both similar and dissimilar local structure. We then apply our framework to evaluate three publicly available real-world graph datasets and demonstrate, using gradual edge perturbations, that GraphDCA satisfyingly captures gradually decreasing similarity, unlike global statistics. Finally, we use GraphDCA to evaluate two state-of-the-art graph generative models, NetGAN and CELL, and conclude that further improvements are needed for these models to adequately reproduce local structural features.

Keywords
Representation Learning, Machine Learning, Graph Generative Models, Node Embeddings
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-312720 (URN)
Note

QC 20220614

Available from: 2022-05-20 Created: 2022-05-20 Last updated: 2022-06-25Bibliographically approved
Organisations

Search in DiVA

Show all publications