Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Finding Clusters of Similar Artists – Analysis of DBSCAN and K-Means Clustering.
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2012 (English)Independent thesis Advanced level (professional degree), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

We have applied k-means clustering and DBSCAN to the problem of finding sets of similar artists based on a large number of artists and their genres. For our experiments we used data from the Million Song Dataset, which is a freely available collection of a million popular music tracks’ metadata created specifically for research. We ran the algorithms with varying values on their parameters and studied the effects. The resulting clusters were analyzed and for k-means we found three different types of clusters. Although the results from k-means were quite noisy, many of the clusters could be used gain some insight in the similarity between artists.This implied that using distances as a representation of similarities between artists is viable. DBSCAN did not prove to be as useful. This was because its clustering method is density-based and the density of the clusters in the input data differed by far too much for DBSCAN to handle. We found that more features in the input data, such as genre per track, would be desirable and would probably improve the results of the algorithms. Further study of other clustering algorithms applied to the same data would shed light on the actual effectiveness of the algorithms studied here.

Abstract [sv]

Vi har tillämpat k-means klustring och DBSCAN på problemet att hitta grupper av liknande artister baserat på ett stort antal artister och deras genrer. Till våra experiment har vi använt data från Million Song Dataset, som är en fritt tillgänglig samling av en miljon populära sångers metadata, som skapats speciellt för forskning. Vi körde algoritmerna med varierande värden på deras parametrar och studerade effekterna. De resulterande klustren analyserades och för k-means fann vi tre olika typer av kluster. Även om resultaten från k-means innehöll ganska mycket brus, så skulle många av klustren kunna användas för att få en viss inblick i likheten mellan artister. Detta implicerar att man kan använda avstånd som en representation för likheter mellan artister. Resultaten från DBSCAN visade sig inte vara lika användbara. Detta berodde på att dess klustringsmetod är densitetsbaserad och densiteten hos klustren i indata skilde sig alltför mycket för att DBSCAN skulle klara av hitta dem. Vi fann att fler egenskaper i indata, såsom genre per spår, skulle vara önskvärt och skulle sannolikt förbättra resultaten från algoritmerna. Ytterligare studier av andra klustringsalgoritmer som tillämpas på samma data skulle belysa den faktiska effekten av de algoritmer studerade här.

Place, publisher, year, edition, pages
2012.
Series
Kandidatexjobb CSC, K12033
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-131042OAI: oai:DiVA.org:kth-131042DiVA: diva2:654488
Educational program
Master of Science in Engineering - Computer Science and Technology
Uppsok
Technology
Supervisors
Examiners
Available from: 2013-10-07 Created: 2013-10-07

Open Access in DiVA

No full text

Other links

http://www.csc.kth.se/utbildning/kandidatexjobb/datateknik/2012/rapport/hakansson_jacob_OCH_nordstrom_walter_K12033.pdf
By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 103 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf