Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
DeGPar: Large Scale Topic Detection usingNode-Cut Partitioning on Dense Weighted Graphs
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS. Swedish Institute of Technology (SICS).ORCID iD: 0000-0003-1007-8533
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0003-4516-7317
2017 (English)In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), IEEE conference proceedings, 2017, 775-785 p., 7980020Conference paper, Published paper (Refereed)
Abstract [en]

Topic Detection (TD) refers to automatic techniques for locating topically related material in web documents. Nowadays, massive amounts of documents are generated by users of Online Social Networks (OSNs), in form of very short text, tweets and snippets of news. While topic detection, in its traditional form, is applied to a few documents containing a lot of information, the problem has now changed to dealing with massive number of documents with very little information. The traditional solutions, thus, fall short either in scalability (due to huge number of input items) or sparsity (due to insufficient information per input item). In this paper we address the scalability problem by introducing an efficient and scalable graph based algorithm for TD on short texts, leveraging dimensionality reduction and clustering techniques. We first, compress the input set of documents into a dense graph, such that frequent co-occurrence patterns in the documents create multiple dense topological areas in the graph. Then, we partition the graph into multiple dense sub-graphs, each representing a topic. We compare the accuracy and scalability of our solution with two state-of-the-art solutions (including the standard LDA, and BiTerm). The results on two widely used benchmark datasets show that our algorithm not only maintains a similar or better accuracy, but also performs by an order of magnitude faster than the state-of-the-art approaches.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2017. 775-785 p., 7980020
Series
Proceedings - International Conference on Distributed Computing Systems, ISSN 1063-6927
Keyword [en]
TopicDetection, Node-cut Graph Partitioning, Distributed Algorithms, Random Indexing, Dimensionality Reduction, Dense Weighted Graph Partitioning, Online Social Networks
National Category
Computer Systems
Research subject
Computer Science; Information and Communication Technology; Applied and Computational Mathematics
Identifiers
URN: urn:nbn:se:kth:diva-204406DOI: 10.1109/ICDCS.2017.19Scopus ID: 2-s2.0-85027258993ISBN: 9781538617915 (print)OAI: oai:DiVA.org:kth-204406DiVA: diva2:1086963
Conference
37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, J.W. Marriott Hotel, Atlanta, United States, 5 June 2017 through 8 June 2017
Note

QC 20170407

Available from: 2017-04-05 Created: 2017-04-05 Last updated: 2017-08-30Bibliographically approved

Open Access in DiVA

fulltext(2117 kB)10 downloads
File information
File name FULLTEXT01.pdfFile size 2117 kBChecksum SHA-512
0ca48d9d4e7463c2cb28454971781e176775f7cea834eee82d1bd3df137e45e1baa7b9decf2fa63d265ac5490e52c7b0662586c98a94f79fafcc76e076dbc8b2
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Ghoorchian, KambizGirdzijauskas, Sarunas
By organisation
Software and Computer systems, SCS
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 10 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 108 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf