kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning and Data Selection in Big Datasets
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.ORCID iD: 0000-0001-6737-0266
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. COMELEC Department, Telecom ParisTech, Paris, France.ORCID iD: 0000-0002-9442-671X
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.ORCID iD: 0000-0001-9810-3478
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Information Science and Engineering.ORCID iD: 0000-0002-7926-5081
2019 (English)In: 36th International Conference on Machine Learning, ICML 2019, 2019, p. 3848-3857Conference paper, Published paper (Refereed)
Abstract [en]

Finding a dataset of minimal cardinality to characterize the optimal parameters of a model is of paramount importance in machine learning and distributed optimization over a network. This paper investigates the compressibility of large datasets. More specifically, we propose a framework that jointly learns the input-output mapping as well as the most representative samples of the dataset (sufficient dataset). Our analytical results show that the cardinality of the sufficient dataset increases sub-linearly with respect to the original dataset size. Numerical evaluations of real datasets reveal a large compressibility, up to 95%, without a noticeable drop in the learnability performance, measured by the generalization error.

Place, publisher, year, edition, pages
2019. p. 3848-3857
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
Keywords [en]
machine learning, optimization, non-convex, data compression
National Category
Computer Sciences
Research subject
Applied and Computational Mathematics, Optimization and Systems Theory; Information and Communication Technology; Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-260389ISI: 000684034302034Scopus ID: 2-s2.0-85078292566OAI: oai:DiVA.org:kth-260389DiVA, id: diva2:1355470
Conference
36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, Unitd States of America,9 June - 15 June 2019
Funder
Swedish Research Council
Note

Part of proceedings: ISBN 978-151088698-8

QC 20230921

Available from: 2019-09-29 Created: 2019-09-29 Last updated: 2023-09-21Bibliographically approved

Open Access in DiVA

fulltext(392 kB)195 downloads
File information
File name FULLTEXT01.pdfFile size 392 kBChecksum SHA-512
8b61a678900fa6013c46a550d33e529628150d18c0bb42258fb6239d84ffcca52ded02a2795e1698476bbd571b0e05a5c05383d7cbf79dd6128576936e0f8861
Type fulltextMimetype application/pdf

Other links

ScopusConferenceProceedings

Authority records

Shokri-Ghadikolaei, HosseinGhauch, HadiFischione, CarloSkoglund, Mikael

Search in DiVA

By author/editor
Shokri-Ghadikolaei, HosseinGhauch, HadiFischione, CarloSkoglund, Mikael
By organisation
Network and Systems EngineeringInformation Science and Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 195 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 437 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf