Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models
KTH, Skolan för elektroteknik och datavetenskap (EECS), Programvaruteknik och datorsystem, SCS.ORCID-id: 0000-0001-8457-4105
2018 (engelsk)Inngår i: Journal of Computational And Graphical Statistics, ISSN 1061-8600, E-ISSN 1537-2715, Vol. 27, nr 2, s. 449-463Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Topic models, and more specifically the class of latent Dirichlet allocation (LDA), are widely used for probabilistic modeling of text. Markov chain Monte Carlo (MCMC) sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler. Supplementary materials for this article are available online.

sted, utgiver, år, opplag, sider
American Statistical Association , 2018. Vol. 27, nr 2, s. 449-463
Emneord [en]
Bayesian inference, Computational complexity, Gibbs sampling, Latent Dirichlet allocation, Massive datasets, Parallel computing
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-238254DOI: 10.1080/10618600.2017.1366913ISI: 000435688200018Scopus ID: 2-s2.0-85046690915OAI: oai:DiVA.org:kth-238254DiVA, id: diva2:1259974
Forskningsfinansiär
Swedish Foundation for Strategic Research
Merknad

QC 20181031

Tilgjengelig fra: 2018-10-31 Laget: 2018-10-31 Sist oppdatert: 2018-10-31bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Personposter BETA

Broman, David

Søk i DiVA

Av forfatter/redaktør
Broman, David
Av organisasjonen
I samme tidsskrift
Journal of Computational And Graphical Statistics

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 110 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf