Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models
KTH, Skolan för elektroteknik och datavetenskap (EECS), Programvaruteknik och datorsystem, SCS.ORCID-id: 0000-0001-8457-4105
2018 (Engelska)Ingår i: Journal of Computational And Graphical Statistics, ISSN 1061-8600, E-ISSN 1537-2715, Vol. 27, nr 2, s. 449-463Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Topic models, and more specifically the class of latent Dirichlet allocation (LDA), are widely used for probabilistic modeling of text. Markov chain Monte Carlo (MCMC) sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler. Supplementary materials for this article are available online.

Ort, förlag, år, upplaga, sidor
American Statistical Association , 2018. Vol. 27, nr 2, s. 449-463
Nyckelord [en]
Bayesian inference, Computational complexity, Gibbs sampling, Latent Dirichlet allocation, Massive datasets, Parallel computing
Nationell ämneskategori
Sannolikhetsteori och statistik
Identifikatorer
URN: urn:nbn:se:kth:diva-238254DOI: 10.1080/10618600.2017.1366913ISI: 000435688200018Scopus ID: 2-s2.0-85046690915OAI: oai:DiVA.org:kth-238254DiVA, id: diva2:1259974
Forskningsfinansiär
Stiftelsen för strategisk forskning (SSF)
Anmärkning

QC 20181031

Tillgänglig från: 2018-10-31 Skapad: 2018-10-31 Senast uppdaterad: 2018-10-31Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Broman, David

Sök vidare i DiVA

Av författaren/redaktören
Broman, David
Av organisationen
Programvaruteknik och datorsystem, SCS
I samma tidskrift
Journal of Computational And Graphical Statistics
Sannolikhetsteori och statistik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 110 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf